The presentation discusses how to migrate expensive open source big data workloads to Azure and leverage latest compute and storage innovations within Azure Synapse with Azure Data Lake Storage to develop a powerful and cost effective analytics solutions. It shows how you can bring your .NET expertise with .NET for Apache Spark to bear and how the shared meta data experience in Synapse makes it easy to create a table in Spark and query it from T-SQL.
4. Traditional on-prem analytics pipeline
Operational
database
Business/custom apps
Operational
database
Operational
database
Enterprise data
warehouse
Data mart
Data mart
Data mart
ETL
ETL
ETL
ETL ETL
ETL
ETL
Reporting
Analytics
Data mining
5. Modern data warehouse
Logs (structured)
Media (unstructured)
Files (unstructured)
Business/custom apps
(structured)
Ingest Prep & train Model & serve
Store
Azure Data Lake Storage
Azure SQL
Data Warehouse
Azure DatabricksAzure Data Factory
Power BI
6. Modern data warehouse with Azure Synapse
Logs (structured)
Media (unstructured)
Files (unstructured)
Business/custom apps
(structured)
Azure
Synapse
Analytics Power BI
Store
Azure Data Lake Storage
7. Modern data warehouse with Azure Synapse
Logs (structured)
Media (unstructured)
Files (unstructured)
Business/custom apps
(structured)
Analytics runtimes
SQL
Common data estate
Shared meta data
Unified experience
Synapse Studio
Store
Azure Data Lake Storage
Power BI
8. Cost optimization with Azure Data Lake Storage
Disaggregated compute
and storage with shared
meta data layer
Lifecycle management
for optimizing TCO
Lower compute resources
because of high performance
9. .NET for Apache Spark and Azure Synapse
First-class C# and F# bindings to Apache Spark,
bringing the power of big data analytics to .NET
developers
Apache Spark 2.4/3.0
Data Frames, Structured
Streaming, Delta Lake
Performance
optimized with
Apache Arrow and
HW Vectorization
Learn more at
http://dot.net/Spark
First class integration
in Azure Synapse:
Batch Submission
Interactive .NET
notebooks
.NET Standard 2.0
C# and F#
ML.NET
.NET
10. Demo: .NET for Spark and shared metadata
experience in Azure Synapse
Michael Rys, @MikeDoesBigData
Analysis with
interactive .NET
for Spark
Notebook
Data prep with
.NET for Spark
Twitter CSV files
Seamless analysis
with SQL
What has
Michael been
up?
Mentions
Topics
Who was
interacting
with Michael?
Michael
@MikeDoesBigData
11. Guidance from experts
Microsoft Docs
Explore overviews, tutorials,
code samples, and more.
Azure Data Lake Storage: https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-introduction
Azure Synapse Analytics: https://docs.microsoft.com/azure/synapse-analytics
.NET for Apache Spark: https://dot.net/Spark
Notes de l'éditeur
Establish the baseline
The need to provision for max utilization/max consumption
Architectural brittleness in moving data across physical stores
Pay for consumption model
Compute elasticity
Data evolves ‘in place’ within ubiquitous storage service
Encapsulates the MDW pattern within the Synapse service
Retain benefits of pay for consumption & ubiquitous store
Unified experience leveraging heterogenous set of tools/frameworks
Shared meta data service means that table definitions do not need to be restated as pipeline flows