Running cost effective big data workloads with Azure Synapse and Azure Data Lake Storage (Build 2020-INT130)

•Télécharger en tant que PPTX, PDF•

1 j'aime•735 vues

The presentation discusses how to migrate expensive open source big data workloads to Azure and leverage latest compute and storage innovations within Azure Synapse with Azure Data Lake Storage to develop a powerful and cost effective analytics solutions. It shows how you can bring your .NET expertise with .NET for Apache Spark to bear and how the shared meta data experience in Synapse makes it easy to create a table in Spark and query it from T-SQL.

Données & analyses

Running cost effective big data workloads with
Azure Synapse and Azure Data Lake Storage
James Baker
Michael Rys
Rukmani Gopalan

Agenda 1. Modernize your big data workloads
2. .NET for Apache Spark
3. Demo

Traditional on-prem analytics pipeline
Operational
database
Business/custom apps
Operational
database
Operational
database
Enterprise data
warehouse
Data mart
Data mart
Data mart
ETL
ETL
ETL
ETL ETL
ETL
ETL
Reporting
Analytics
Data mining

Modern data warehouse
Logs (structured)
Media (unstructured)
Files (unstructured)
Business/custom apps
(structured)
Ingest Prep & train Model & serve
Store
Azure Data Lake Storage
Azure SQL
Data Warehouse
Azure DatabricksAzure Data Factory
Power BI

Modern data warehouse with Azure Synapse
Logs (structured)
Media (unstructured)
Files (unstructured)
Business/custom apps
(structured)
Azure
Synapse
Analytics Power BI
Store
Azure Data Lake Storage

Modern data warehouse with Azure Synapse
Logs (structured)
Media (unstructured)
Files (unstructured)
Business/custom apps
(structured)
Analytics runtimes
SQL
Common data estate
Shared meta data
Unified experience
Synapse Studio
Store
Azure Data Lake Storage
Power BI

Cost optimization with Azure Data Lake Storage
Disaggregated compute
and storage with shared
meta data layer
Lifecycle management
for optimizing TCO
Lower compute resources
because of high performance

.NET for Apache Spark and Azure Synapse
 First-class C# and F# bindings to Apache Spark,
bringing the power of big data analytics to .NET
developers
Apache Spark 2.4/3.0
Data Frames, Structured
Streaming, Delta Lake
Performance
optimized with
Apache Arrow and
HW Vectorization
Learn more at
http://dot.net/Spark
First class integration
in Azure Synapse:
Batch Submission
Interactive .NET
notebooks
.NET Standard 2.0
C# and F#
ML.NET
.NET

Demo: .NET for Spark and shared metadata
experience in Azure Synapse
Michael Rys, @MikeDoesBigData
Analysis with
interactive .NET
for Spark
Notebook
Data prep with
.NET for Spark
Twitter CSV files
Seamless analysis
with SQL
What has
Michael been
up?
Mentions
Topics
Who was
interacting
with Michael?
Michael
@MikeDoesBigData

Guidance from experts
Microsoft Docs
Explore overviews, tutorials,
code samples, and more.
Azure Data Lake Storage: https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-introduction
Azure Synapse Analytics: https://docs.microsoft.com/azure/synapse-analytics
.NET for Apache Spark: https://dot.net/Spark

Running cost effective big data workloads with Azure Synapse and Azure Data Lake Storage (Build 2020-INT130)

Contenu connexe

Tendances

Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.

Microsoft cloud big data strategy

James Serra

Digital Transformation with Microsoft Azure

Luan Moreno Medeiros Maciel

Data warehouse con azure synapse analytics

Eduardo Castro

Streaming Real-time Data to Azure Data Lake Storage Gen 2

Carole Gunst

Big Data in Azure

DataWorks Summit/Hadoop Summit

Azure Data Lake and Azure Data Lake Analytics

Waqas Idrees

Azure Data Lake Intro (SQLBits 2016)

Michael Rys

Azure Data Factory

HARIHARAN R

Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"

DataConf

Azure data factory

BizTalk360

Data Con LA 2020 Description Data warehouses are not enough. Data lakes are the backbone of a modern data environment. Data Lakes are best built leveraging unique services of the cloud provider to reduce operations complexity. This session will explain why everyone's talking about data lakes, break down the best services in Azure to build a Data Lake, and walk through code for querying and loading with Azure Databricks and Event Hubs for Kafka. Attendees will leave the session with a firm grasp of why we build data lakes and how Azure Databricks fits in for ETL and querying. Speaker Dustin Vannoy, Dustin Vannoy Consulting, Principal Data Engineer

Data Lakes with Azure Databricks

Data Con LA

Azure Data Factory is one of the newer data services in Microsoft Azure and is part of the Cortana Analyics Suite, providing data orchestration and movement capabilities. This session will describe the key components of Azure Data Factory and take a look at how you create data transformation and movement activities using the online tooling. Additionally, the new tooling that shipped with the recently updated Azure SDK 2.8 will be shown in order to provide a quickstart for your cloud ETL projects.

A lap around Azure Data Factory

BizTalk360

Rajesh Dadhia. This session introduces the newest services in the Cortana Analytics family. Azure Data Lake is a hyper-scale data repository designed for big data analytics workloads. It provides a single place to store any type of data in its native format. In this session, we will show how the HDFS compatibility of Azure Data Lake as a Hadoop File System enables all Hadoop workloads including Azure HDInsight, Hortonworks and Cloudera. Further, we will focus on the key capabilities of the Azure Data Lake that make it an ideal choice for storing, accessing and sharing data for a wide range of analytics applications. Go to https://channel9.msdn.com/ to find the recording of this session.

Cortana Analytics Workshop: Azure Data Lake

MSAdvAnalytics

Architecting a datalake

Laurent Leturgez

201905 Azure Databricks for Machine Learning

Mark Tabladillo

Designing a modern data warehouse in azure

Antonios Chatzipavlis

Global AI Bootcamp Madrid - Azure Databricks

Alberto Diaz Martin

Azure data bricks by Eugene Polonichko

Alex Tumanoff

Introduction to Azure Data Lake

Antonios Chatzipavlis

J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen

MS Cloud Summit

Tendances (20)

Microsoft cloud big data strategy

Digital Transformation with Microsoft Azure

Data warehouse con azure synapse analytics

Streaming Real-time Data to Azure Data Lake Storage Gen 2

Big Data in Azure

Azure Data Lake and Azure Data Lake Analytics

Azure Data Lake Intro (SQLBits 2016)

Azure Data Factory

Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"

Azure data factory

Data Lakes with Azure Databricks

A lap around Azure Data Factory

Cortana Analytics Workshop: Azure Data Lake

Architecting a datalake

201905 Azure Databricks for Machine Learning

Designing a modern data warehouse in azure

Global AI Bootcamp Madrid - Azure Databricks

Azure data bricks by Eugene Polonichko

Introduction to Azure Data Lake

J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen

Similaire à Running cost effective big data workloads with Azure Synapse and Azure Data Lake Storage (Build 2020-INT130)

Topic of presentation: Azure Data Lake: what is it? why is it? where is it? The main points of the presentation: What is Azure Data Lake? Why does this technology call Microsoft Big Data? Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics. http://dataconf.com.ua/index.php#agenda #dataconf #AIBDConference

Ai big dataconference_eugene_polonichko_azure data lake

Olga Zinkevych

Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.

Azure Synapse Analytics Overview (r1)

James Serra

Cloud Storage is evolving rapidly, and our Azure Storage portfolio has added a ton of new industry leading capabilities. In this session you will learn the do's and don'ts of building data lakes on Azure Data Lake Storage. You will learn about the commonly used patterns, how to set up your accounts and pipelines to maximize performance, how to organize your data and various options to secure access to your data. We will also cover customer use cases and highlight planned enhancements and upcoming features.

Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...

Rukmani Gopalan

Modern Analytics Academy - Data Modeling (1).pptx

ssuser290967

Azure Synapse Analytics Overview (r2)

James Serra

Prague data management meetup 2018-03-27

Martin Bém

Analytics in the Cloud

Ross McNeely

Lake Database Database Template Map Data in Azure Synapse Analytics

Erwin de Kreuk

Introduction to Azure Synapse Webinar

Peter Ward

Synapse for mere mortals

Michael Stephenson

Modern data warehouse

Elena Lopez

Big Data Analytics from Azure Cloud to Power BI Mobile

Roy Kim

Eugene Polonichko "Architecture of modern data warehouse"

Lviv Startup Club

IBM Cloud Native Day April 2021: Serverless Data Lake

Torsten Steinbach

Azure Data Platform Overview.pdf

Dustin Vannoy

Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data and analytics application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data. In this one-hour webinar, we will look at the portfolio of AWS Big Data services and how they can be used to build a modern data architecture. We will cover: Using different SQL engines to analyze large amounts of structured data Analysing streaming data in near-real time Architectures for batch processing Best practices for Data Lake architectures This session is suited for: Solution and enterprise architects Data architects/ Data warehouse owners IT & Innovation team members

Building A Modern Data Analytics Architecture on AWS

Amazon Web Services

Modern Data Architectures for Business Insights at Scale

Amazon Web Services

TechEvent Databricks on Azure

Trivadis

How can you use Big Data to grow your business and discover new opportunities? When organizations effectively capture, analyze, visualize and apply big data insights to their business goals, they differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line. With Amazon Web Services, businesses and researchers can easily fulfill their high performance computing (HPC) requirements with the added benefit of ad-hoc provisioning, pay-as-you-go pricing and faster time-to-results. Join this session to understand how to run HPC applications in AWS cloud, and about different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.

Getting Started with Big Data and HPC in the Cloud - August 2015

Amazon Web Services

IBM Cloud Day January 2021 Data Lake Deep Dive

Torsten Steinbach

Similaire à Running cost effective big data workloads with Azure Synapse and Azure Data Lake Storage (Build 2020-INT130) (20)

Ai big dataconference_eugene_polonichko_azure data lake

Azure Synapse Analytics Overview (r1)

Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...

Modern Analytics Academy - Data Modeling (1).pptx

Azure Synapse Analytics Overview (r2)

Prague data management meetup 2018-03-27

Analytics in the Cloud

Lake Database Database Template Map Data in Azure Synapse Analytics

Introduction to Azure Synapse Webinar

Synapse for mere mortals

Modern data warehouse

Big Data Analytics from Azure Cloud to Power BI Mobile

Eugene Polonichko "Architecture of modern data warehouse"

IBM Cloud Native Day April 2021: Serverless Data Lake

Azure Data Platform Overview.pdf

Building A Modern Data Analytics Architecture on AWS

Modern Data Architectures for Business Insights at Scale

TechEvent Databricks on Azure

Getting Started with Big Data and HPC in the Cloud - August 2015

IBM Cloud Day January 2021 Data Lake Deep Dive

Plus de Michael Rys

Big Data Processing with .NET and Spark (SQLBits 2020)

Michael Rys

Big Data Processing with Spark and .NET - Microsoft Ignite 2019

Michael Rys

Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...

Michael Rys

Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...

Michael Rys

More and more customers who are looking to modernize analytics needs are exploring the data lake approach in Azure. Typically, they are most challenged by a bewildering array of poorly integrated technologies and a variety of data formats, data types not all of which are conveniently handled by existing ETL technologies. In this session, we’ll explore the basic shape of a modern ETL pipeline through the lens of Azure Data Lake. We will explore how this pipeline can scale from one to thousands of nodes at a moment’s notice to respond to business needs, how its extensibility model allows pipelines to simultaneously integrate procedural code written in .NET languages or even Python and R, how that same extensibility model allows pipelines to deal with a variety of formats such as CSV, XML, JSON, Images, or any enterprise-specific document format, and finally explore how the next generation of ETL scenarios are enabled though the integration of Intelligence in the data layer in the form of built-in Cognitive capabilities.

Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...

Michael Rys

When processing TB and PB of data, running your Big Data queries at scale and having them perform at peak is essential. In this session, we show you some state-of-the art tools on how to analyze U-SQL job performances and we discuss in-depth best practices on designing your data layout both for files and tables and writing performing and scalable queries using U-SQL. You will learn how to analyze performance and scale bottlenecks and will learn several tips on how to make your big data processing scripts both faster and scale better.

Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...

Michael Rys

Big data processing increasingly needs to address not just querying big data but needs to apply domain specific algorithms to large amounts of data at scale. This ranges from developing and applying machine learning models to custom, domain specific processing of images, texts, etc. Often the domain experts and programmers have a favorite language that they use to implement their algorithms such as Python, R, C#, etc. Microsoft Azure Data Lake Analytics service is making it easy for customers to bring their domain expertise and their favorite languages to address their big data processing needs. In this session, I will showcase how you can bring your Python, R, and .NET code and apply it at scale using U-SQL.

Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...

Michael Rys

From theory to implementation - follow the steps of implementing an end-to-end analytics solution illustrated with some best practices and examples in Azure Data Lake. During this full training day we will share the architecture patterns, tooling, learnings and tips and tricks for building such services on Azure Data Lake. We take you through some anti-patterns and best practices on data loading and organization, give you hands-on time and the ability to develop some of your own U-SQL scripts to process your data and discuss the pros and cons of files versus tables. This were the slides presented at the SQLBits 2018 Training Day on Feb 21, 2018.

Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...

Michael Rys

When analyzing big data, you often have to process data at scale that is not rectangular in nature and you would like to scale out your existing programs and cognitive algorithms to analyze your data. To address this need and make it easy for the programmer to add her domain specific code, U-SQL includes a rich extensibility model that allows you to process any kind of data, ranging from CSV files over JSON and XML to image files and add your own custom operators. In this presentation, we will provide some examples on how to use U-SQL to process interesting data formats with custom extractors and functions, including JSON, images, use U-SQL’s cognitive library and finally show how U-SQL allows you to invoke custom code written in Python and R. Slides for SQL Saturday 635, Vancouver BC presentation, Vancouver BC. Aug 2017.

U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...

Michael Rys

Data Lakes have become a new tool in building modern data warehouse architectures. In this presentation we will introduce Microsoft's Azure Data Lake offering and its new big data processing language called U-SQL that makes Big Data Processing easy by combining the declarativity of SQL with the extensibility of C#. We will give you an initial introduction to U-SQL by explaining why we introduced U-SQL and showing with an example of how to analyze some tweet data with U-SQL and its extensibility capabilities and take you on an introductory tour of U-SQL that is geared towards existing SQL users. slides for SQL Saturday 635, Vancouver BC, Aug 2017

Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)

Michael Rys

U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...

Michael Rys

The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)

Michael Rys

Introducing U-SQL (SQLPASS 2016)

Michael Rys

Tuning and Optimizing U-SQL Queries (SQLPASS 2016)

Michael Rys

Taming the Data Science Monster with A New ‘Sword’ – U-SQL

Michael Rys

Killer Scenarios with Data Lake in Azure with U-SQL

Michael Rys

ADL/U-SQL Introduction (SQLBits 2016)

Michael Rys

U-SQL Learning Resources (SQLBits 2016)

Michael Rys

U-SQL Federated Distributed Queries (SQLBits 2016)

Michael Rys

U-SQL Partitioned Data and Tables (SQLBits 2016)

Michael Rys

Plus de Michael Rys (20)