SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
SQL Analytics
powering Telemetry
Analysis at Comcast
Suraj Nesamani,
Principal Engineer, RDK Analytics @
Comcast
and
Molly Nagamuthu,
Sr Resident Solutions Architect @
Databricks
Agenda
▪ Introduction to the Lakehouse
Platform
▪ SQL Analytics - under the hood
by Molly Nagamuthu
▪ RDK challenges at Comcast
▪ SQL Analytics Test and Learn
by Suraj Nesamani
20+ years in Product Development,
Engineering & Professional Services
Telecom
Healthcare & Media
Finance
Molly Nagamuthu
Sr Resident Solutions Architect @ Databricks
Databricks’ vision is to
enable data-driven
innovation to all
enterprises
Lakehouse Platform
Today, most enterprises struggle with data
Siloed stacks increase data architecture complexity
Data Warehousing Data Engineering Streaming Data Science & Machine Learning
Extract Load
Transform
Streaming data sources
Streaming Data Engine
Real-time Database
Analytics and BI
Data marts
Data warehouse
Structured data
Structured, semi-structured
and unstructured data
Structured, semi-structured
and unstructured data
Data Lake
Data prep
Data Lake
Machine
Learning
Data
Science
Today, most enterprises struggle with data
Siloed stacks increase data architecture complexity
Data Warehousing Data Engineering Streaming Data Science & Machine Learning
Extract Load
Transform
Streaming data sources
Streaming Data Engine
Real-time Database
Analytics and BI
Data marts
Data warehouse
Structured data
Structured, semi-structured
and unstructured data
Structured, semi-structured
and unstructured data
Data Lake
Data prep
Data Lake
Machine
Learning
Data
Science
Amazon Redshift Teradata
Azure Synapse Google BigQuery
Snowflake IBM Db2
SAP Oracle Autonomous
Data Warehouse
Hadoop Apache Airflow
Amazon EMR Apache Spark
Google Dataproc Cloudera
Jupyter Amazon SageMaker
Azure ML Studio MatLAB
Domino Data Labs SAS
TensorFlow PyTorch
Apache Kafka Apache Spark
Apache Flink Amazon Kinesis
Azure Stream Analytics Google Dataflow
Tibco Spotfire Confluent
Disconnected systems and proprietary data formats make integration di cult
Today, most enterprises struggle with data
Siloed stacks increase data architecture complexity
Data Warehousing Data Engineering Streaming Data Science & Machine Learning
Extract Load
Transform
Streaming data sources
Streaming Data Engine
Real-time Database
Analytics and BI
Data marts
Data warehouse
Structured data
Structured, semi-structured
and unstructured data
Structured, semi-structured
and unstructured data
Data Lake
Data prep
Data Lake
Machine
Learning
Data
Science
Amazon Redshift Teradata
Azure Synapse Google BigQuery
Snowflake IBM Db2
SAP Oracle Autonomous
Data Warehouse
Hadoop Apache Airflow
Amazon EMR Apache Spark
Google Dataproc Cloudera
Jupyter Amazon SageMaker
Azure ML Studio MatLAB
Domino Data Labs SAS
TensorFlow PyTorch
Apache Kafka Apache Spark
Apache Flink Amazon Kinesis
Azure Stream Analytics Google Dataflow
Tibco Spotfire Confluent
Disconnected systems and proprietary data formats make integration di cult
Data Scientists
Data Engineers
Data Analysts Data Engineers
Siloed data teams decrease productivity
Structured Semi-structured Unstructured Streaming
Lakehouse Platform
Data Engineering
BI & SQL
Analytics
Real-time Data
Applications
Data Science
& Machine Learning
Data Management & Governance
Open Data Lake
SIMPLE OPEN COLLABORATIVE
SQL Analytics
Databricks SQL Analytics
Delivering analytics on the freshest data
with data warehouse performance and
data lake economics
■ Query your lakehouse with better price /
performance
■ Simplify discovery and sharing of new insights
■ Connect to familiar BI tools, like Tableau or Power BI
■ Simplify administration and governance
Broad integration with BI tools
Connect your preferred BI tools with
optimized connectors that provide
fast performance, low latency, and
high user conconcurrency to your data
lake for your existing BI tools.
Coming soon:
Use Cases
Collaborative exploratory
data analysis on your
data lake
Data-enhanced
applications
Connect existing BI tools
and use one source of
truth for all your data
Connect your existing BI tools to your data
lake with optimized connectors and
ODBC/JDBC Drivers
Databricks SQL Analytics
Under the hood: SQL Analytics
SQL Endpoints
Vectorized Query Engine
Build a curated cloud data lake on
an open format with Delta Lake
Filtered, Cleaned,
Augmented
Silver
Raw Ingestion
and History
Bronze
Business-level
Aggregates
Gold
Curated Data
Structured, Semi-Structured, and Unstructured Data
BI/SQL Connectors
SQL Interface
Next generation query engine providers
real-life performance for all queries
Based on Redash, query your entire
data lake with SQL and visualize results
Quickly setup SQL / BI optimized compute with
best price/performance and track usage
Demo
Cluster sizecheck your cloud provider quotas!
Check documentation for latest cluster size:
https://docs.sql.azuredatabricks.net/sql/admin/sql-endpoints.html
https://docs.sql.databricks.com/sql/admin/sql-endpoints.html
Additional Resources
➢ https://docs.databricks.com/sql/index.html
➢ https://databricks.com/product/data-lakehouse
➢ https://databricks.com/discover/demos/sql-analytics
➢ Customer Success Offering SQL Analytics MVP available in Q2
Related Talks
WEDNESDAY
• 03:50 PM (PT): Databricks SQL Analytics Deep Dive for the Data Analyst - Doug Bateman, Databricks
• 04:25 PM (PT): Radical Speed for SQL Queries on Databricks: Photon Under the Hood - Greg Rahn & Alex Behm,
Databricks
• 04:25 PM (PT): Delivering Insights from 20M+ Smart Homes with 500M+ devices - Sameer Vaidya, Plume
THURSDAY
• 11:00 AM (PT): Getting Started with Databricks SQL Analytics - Simon Whiteley, Advancing Analytics
• 03:15 PM (PT): Building Lakehouses on Delta Lake and SQL Analytics - A Primer - Franco Patano, Databricks
FRIDAY
• 10:30 AM (PT): SQL Analytics Powering Telemetry Analysis at Comcast - Suraj Nesamani, Comcast
& Molly Nagamuthu, Databricks
Reference Design Kit (RDK)
Suraj Nesamani
Principal Engineer, RDK@Comcast
15 plus years of experience in Engineering,
Mostly specializing in RDK Telemetry and
Big Data Analysis.
Experienced in handling Peta-Byte scale
IoT data
The RDK is a Whole Home Open Source Software Platform
powering Video, Broadband and IoT Devices.
It enables operators to manage devices and easily customize their
UI’s and Apps, and provides analytics to improve the customer
experience and drive business results.
➢ https://rdkcentral.com/
RDK - Reference Design Kit
Raw
Telemetry
Data
Raw
Telemetry
Data
Formatted
Data
Copy data
to Redshift
RDK Telemetry Pipeline
Node : dc2.8xlarge
Cluster Size : 12 nodes
Queries executed : 1000 per day
CPU Usage :80% most of the time
Disk Usage : 60%
Storage Capacity
Scalable
Complex Queries
Pricing
Data retention
Sort and distribution keys
Concurrent Execution
AWS Redshift
RDK Analytics Challenge
● Queries that take more CPU and time in cluster
● Workload Management (WLM ) Aborts
● Expensive to add more nodes
● Query output still needed to run the business
RDK-Databricks Partnership
● Migrated a complex redshift pipeline using Spark 3.0 on databricks
● Migrated some EMR workloads
● Optimizations and Databricks training
● Delta Lake Test and Learn
● Upgraded to E2 version of the Databricks Platform - Which is more secure,
scalable and simpler to manage.
The Quest
SQL Analytics-(Test and Learn)-Scope
Analyze Redshift workloads
• 10 slowest performing queries on redshift
• Average runtime for each of these queries is 30 mins
• Currently use 12 dc2.8xl nodes
Disclaimer: For the Test and Learn, we could not reproduce the redshift production environment. So we only
considered the query runtimes for comparison.
Test and Learn Design
Timeline 2-3 weeks
Migrate workloads to Databricks SQL Analytics
• Metastore decisions
• Table Access Control List (ACLs)
• Convert redshift queries to spark sql
• Test workloads, as is , in parquet format
• Convert to Delta Lake and test against Photon
Results
Observations
The Native SQL interface was very intuitive and easy to use
Creating endpoints is extremely simplified. It helps SQL
analysts to a great extent.
Does not have Support for UDFs in SQL Analytics
We did not test ACLs too much but it seemed simple enough.
a centralized catalog would be really nice
Looking forward to ...
Databricks Unity Catalog - Coming soon!
S3 Data Source
GCS Data Source
Cluster or
SQL Endpoint
Database Data Source
Data source
credentials
Users
Unity Catalog
SQL GRANT
permissions
Audit
log
Thank you !
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

Contenu connexe

Tendances

Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoDimko Zhluktenko
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Databricks
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsDatabricks
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar ZecevicDataScienceConferenc1
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 

Tendances (20)

Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 

Similaire à SQL Analytics Powering Telemetry Analysis at Comcast

Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPDatabricks
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeDATAVERSITY
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeBizTalk360
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 Chester Chen
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakeDatabricks
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3Databricks
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...Codit
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeTom Kerkhove
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeDatabricks
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Delivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data servicesDelivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data servicesBhuvaneshwaran R
 

Similaire à SQL Analytics Powering Telemetry Analysis at Comcast (20)

Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta Lake
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data Lake
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Delivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data servicesDelivering business insights and automation utilizing aws data services
Delivering business insights and automation utilizing aws data services
 

Plus de Databricks

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
 

Plus de Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 

Dernier

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Dernier (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 

SQL Analytics Powering Telemetry Analysis at Comcast

  • 1. SQL Analytics powering Telemetry Analysis at Comcast Suraj Nesamani, Principal Engineer, RDK Analytics @ Comcast and Molly Nagamuthu, Sr Resident Solutions Architect @ Databricks
  • 2. Agenda ▪ Introduction to the Lakehouse Platform ▪ SQL Analytics - under the hood by Molly Nagamuthu ▪ RDK challenges at Comcast ▪ SQL Analytics Test and Learn by Suraj Nesamani
  • 3. 20+ years in Product Development, Engineering & Professional Services Telecom Healthcare & Media Finance Molly Nagamuthu Sr Resident Solutions Architect @ Databricks
  • 4. Databricks’ vision is to enable data-driven innovation to all enterprises
  • 6. Today, most enterprises struggle with data Siloed stacks increase data architecture complexity Data Warehousing Data Engineering Streaming Data Science & Machine Learning Extract Load Transform Streaming data sources Streaming Data Engine Real-time Database Analytics and BI Data marts Data warehouse Structured data Structured, semi-structured and unstructured data Structured, semi-structured and unstructured data Data Lake Data prep Data Lake Machine Learning Data Science
  • 7. Today, most enterprises struggle with data Siloed stacks increase data architecture complexity Data Warehousing Data Engineering Streaming Data Science & Machine Learning Extract Load Transform Streaming data sources Streaming Data Engine Real-time Database Analytics and BI Data marts Data warehouse Structured data Structured, semi-structured and unstructured data Structured, semi-structured and unstructured data Data Lake Data prep Data Lake Machine Learning Data Science Amazon Redshift Teradata Azure Synapse Google BigQuery Snowflake IBM Db2 SAP Oracle Autonomous Data Warehouse Hadoop Apache Airflow Amazon EMR Apache Spark Google Dataproc Cloudera Jupyter Amazon SageMaker Azure ML Studio MatLAB Domino Data Labs SAS TensorFlow PyTorch Apache Kafka Apache Spark Apache Flink Amazon Kinesis Azure Stream Analytics Google Dataflow Tibco Spotfire Confluent Disconnected systems and proprietary data formats make integration di cult
  • 8. Today, most enterprises struggle with data Siloed stacks increase data architecture complexity Data Warehousing Data Engineering Streaming Data Science & Machine Learning Extract Load Transform Streaming data sources Streaming Data Engine Real-time Database Analytics and BI Data marts Data warehouse Structured data Structured, semi-structured and unstructured data Structured, semi-structured and unstructured data Data Lake Data prep Data Lake Machine Learning Data Science Amazon Redshift Teradata Azure Synapse Google BigQuery Snowflake IBM Db2 SAP Oracle Autonomous Data Warehouse Hadoop Apache Airflow Amazon EMR Apache Spark Google Dataproc Cloudera Jupyter Amazon SageMaker Azure ML Studio MatLAB Domino Data Labs SAS TensorFlow PyTorch Apache Kafka Apache Spark Apache Flink Amazon Kinesis Azure Stream Analytics Google Dataflow Tibco Spotfire Confluent Disconnected systems and proprietary data formats make integration di cult Data Scientists Data Engineers Data Analysts Data Engineers Siloed data teams decrease productivity
  • 9. Structured Semi-structured Unstructured Streaming Lakehouse Platform Data Engineering BI & SQL Analytics Real-time Data Applications Data Science & Machine Learning Data Management & Governance Open Data Lake SIMPLE OPEN COLLABORATIVE
  • 11. Databricks SQL Analytics Delivering analytics on the freshest data with data warehouse performance and data lake economics ■ Query your lakehouse with better price / performance ■ Simplify discovery and sharing of new insights ■ Connect to familiar BI tools, like Tableau or Power BI ■ Simplify administration and governance
  • 12. Broad integration with BI tools Connect your preferred BI tools with optimized connectors that provide fast performance, low latency, and high user conconcurrency to your data lake for your existing BI tools. Coming soon:
  • 13. Use Cases Collaborative exploratory data analysis on your data lake Data-enhanced applications Connect existing BI tools and use one source of truth for all your data
  • 14. Connect your existing BI tools to your data lake with optimized connectors and ODBC/JDBC Drivers Databricks SQL Analytics Under the hood: SQL Analytics SQL Endpoints Vectorized Query Engine Build a curated cloud data lake on an open format with Delta Lake Filtered, Cleaned, Augmented Silver Raw Ingestion and History Bronze Business-level Aggregates Gold Curated Data Structured, Semi-Structured, and Unstructured Data BI/SQL Connectors SQL Interface Next generation query engine providers real-life performance for all queries Based on Redash, query your entire data lake with SQL and visualize results Quickly setup SQL / BI optimized compute with best price/performance and track usage
  • 15. Demo
  • 16. Cluster sizecheck your cloud provider quotas! Check documentation for latest cluster size: https://docs.sql.azuredatabricks.net/sql/admin/sql-endpoints.html https://docs.sql.databricks.com/sql/admin/sql-endpoints.html
  • 17. Additional Resources ➢ https://docs.databricks.com/sql/index.html ➢ https://databricks.com/product/data-lakehouse ➢ https://databricks.com/discover/demos/sql-analytics ➢ Customer Success Offering SQL Analytics MVP available in Q2
  • 18. Related Talks WEDNESDAY • 03:50 PM (PT): Databricks SQL Analytics Deep Dive for the Data Analyst - Doug Bateman, Databricks • 04:25 PM (PT): Radical Speed for SQL Queries on Databricks: Photon Under the Hood - Greg Rahn & Alex Behm, Databricks • 04:25 PM (PT): Delivering Insights from 20M+ Smart Homes with 500M+ devices - Sameer Vaidya, Plume THURSDAY • 11:00 AM (PT): Getting Started with Databricks SQL Analytics - Simon Whiteley, Advancing Analytics • 03:15 PM (PT): Building Lakehouses on Delta Lake and SQL Analytics - A Primer - Franco Patano, Databricks FRIDAY • 10:30 AM (PT): SQL Analytics Powering Telemetry Analysis at Comcast - Suraj Nesamani, Comcast & Molly Nagamuthu, Databricks
  • 20. Suraj Nesamani Principal Engineer, RDK@Comcast 15 plus years of experience in Engineering, Mostly specializing in RDK Telemetry and Big Data Analysis. Experienced in handling Peta-Byte scale IoT data
  • 21. The RDK is a Whole Home Open Source Software Platform powering Video, Broadband and IoT Devices. It enables operators to manage devices and easily customize their UI’s and Apps, and provides analytics to improve the customer experience and drive business results. ➢ https://rdkcentral.com/ RDK - Reference Design Kit
  • 23. Node : dc2.8xlarge Cluster Size : 12 nodes Queries executed : 1000 per day CPU Usage :80% most of the time Disk Usage : 60%
  • 24. Storage Capacity Scalable Complex Queries Pricing Data retention Sort and distribution keys Concurrent Execution AWS Redshift
  • 25. RDK Analytics Challenge ● Queries that take more CPU and time in cluster ● Workload Management (WLM ) Aborts ● Expensive to add more nodes ● Query output still needed to run the business
  • 26. RDK-Databricks Partnership ● Migrated a complex redshift pipeline using Spark 3.0 on databricks ● Migrated some EMR workloads ● Optimizations and Databricks training ● Delta Lake Test and Learn ● Upgraded to E2 version of the Databricks Platform - Which is more secure, scalable and simpler to manage.
  • 28. SQL Analytics-(Test and Learn)-Scope Analyze Redshift workloads • 10 slowest performing queries on redshift • Average runtime for each of these queries is 30 mins • Currently use 12 dc2.8xl nodes Disclaimer: For the Test and Learn, we could not reproduce the redshift production environment. So we only considered the query runtimes for comparison.
  • 29. Test and Learn Design Timeline 2-3 weeks Migrate workloads to Databricks SQL Analytics • Metastore decisions • Table Access Control List (ACLs) • Convert redshift queries to spark sql • Test workloads, as is , in parquet format • Convert to Delta Lake and test against Photon
  • 31. Observations The Native SQL interface was very intuitive and easy to use Creating endpoints is extremely simplified. It helps SQL analysts to a great extent. Does not have Support for UDFs in SQL Analytics We did not test ACLs too much but it seemed simple enough. a centralized catalog would be really nice
  • 33. Databricks Unity Catalog - Coming soon! S3 Data Source GCS Data Source Cluster or SQL Endpoint Database Data Source Data source credentials Users Unity Catalog SQL GRANT permissions Audit log
  • 35. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.