SlideShare une entreprise Scribd logo
1  sur  57
5
Data sourcesNon-relational data
DESIGNED FOR THE
QUESTIONS YOU KNOW!
The Data Lake Approach
Ingest all data
regardless of
requirements
Store all data
in native format
without schema
definition
Do analysis
Hadoop, Spark, R,
Azure Data Lake
Analytics (ADLA)
Interactive queries
Batch queries
Machine Learning
Data warehouse
Real-time analytics
Devices
Microsoft’s Big Data Journey
We needed to better leverage data and analytics to
do more experimentation
So, we built a Data Lake for Microsoft:
• A data lake for everyone to put their data
• Tools approachable by any developer
• Batch, Interactive, Streaming, ML
By the numbers
• Exabytes of data under management
• 100Ks of Physical Servers
• 100Ks of Batch Jobs, Millions of Interactive Queries
• Huge Streaming Pipelines
• 10K+ Developers running diverse workloads and scenarios
2010 2013 2017
Windows
SMSG
Live
Bing
CRM/Dynamics
Xbox Live
Office365
Malware Protection Microsoft Stores
Commerce Risk
Skype
LCA
Exchange
Yammer
Data Stored
Culture Changes Engineering
How is the system performing? What is the experience my customers are
having? How does that correlate to other actions?
Is my feature successful ?
Marketing
What can we observe from our customers to increase revenues?
Management
How do I drive my business based on the data?
Field
Where are there new opportunities? How can I connect with my
customers more deeply?
Support
How does this customer’s experience compare with others?
HDFS Compatible REST API
ADL Store
.NET, SQL, Python, R
scaled out by U-SQL
ADL Analytics
Open Source Apache
Hadoop ADL Client
Azure Databricks
HDInsight
Hive
• Performance at
scale
• Optimized for
analytics
• Multiple
analytics engines
• Single repository
sharing
HDFS Compatible REST API
ADL Store
Storage
• Architected and built for very high throughput at scale for Big Data workloads
• No limits to file size, account size or number of files
• Single-repository for sharing
• Cloud-scale distributed filesystem with file/folder ACLS and RBAC
• Encryption-at-rest by default with Azure Key Vault
• Authenticated access with Azure Active Directory integration
• Formal Certifications incl. ISO, SOC, PCI, HIPAA
HDFS Compatible REST API
ADL Store
Analytics
Storage
Cloudera CDH
Hortonworks HDP
Qubole QDS
• Open Source Apache® ADL client
for commercial and custom Hadoop
• Cloud IaaS and Hybrid
Best of Databricks Best of Microsoft
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)
Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs)
A Z U R E D ATA B R I C K S
A F A S T , E A S Y , A N D C O L L A B O R A T I V E A P A C H E S P A R K B A S E D A N A L Y T I C S P L A T F O R M
HDFS Compatible REST API
HDInsight
ADL Store
Hive
Analytics
Storage
• 63% lower TCO
than on-premise*
• SLA- managed,
monitored and
supported by
Microsoft
• Fully managed
Hadoop, Spark
and R
• Clusters
deployed in
minutes
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
HDFS Compatible REST API
ADL Store
.NET, SQL, Python, R
scaled out by U-SQL
ADL Analytics• Serverless. Pay per job. Starts in
seconds. Scales instantly.
• Develop massively parallel
programs with simplicity
• Federated query from multiple data
sources
Ingress
• Event Hubs
• IoT Hub
• Kafka
Analytics
• Stream Analytics
• Spark Streaming
• Storm
Sinks
• Data Lake Store
• Blob Store
• SQL Database
• SQL Data Warehouse
• Event Hub
• Power BI
• Table Storage
• Service Bus Queues
• Service Bus Topics
• Cosmos DB
• Azure Functions
• …..
Azure Data Lake
Store
1Create small files
2Copy small
files
3Concat +
copy file
4ASA
5Event Hub
Capture
• Copy
• SDK
• Tools (Storage Explorer, Visual Studio, 3rd Party)
• Data Factory
• SQL Integration Services
• Streaming from external sources
• Generated by cloud analytics
Scales out your custom code in .NET, Python, R over
your Data Lake
Familiar syntax to millions of SQL & .NET developers
Unifies
• Declarative nature of SQL with the imperative
power of your language of choice (e.g., C#,
Python)
• Processing of structured, semi-structured and
unstructured data
• Querying multiple Azure Data Sources
(Federated Query)
U-SQL
A framework for Big Data
Develop massively parallel programs with simplicity
A simple U-SQL script can scale
from Gigabytes to Petabytes
without learning complex big data
programming techniques.
U-SQL automatically generates a scaled
out and optimized execution plan to
handle any amount of data.
Execution nodes immediately
rapidly allocated to run the
program.
Error handling, network issues, and
runtime optimization are handled
automatically.
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int,
Urls string,
ClickedUrls string
FROM @"/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
OUTPUT @searchlog
TO @"/Samples/Output/SearchLog_output.tsv"
USING Outputters.Tsv();
 Automatic "in-lining"
optimized out-of-the-
box
 Per job
parallelization
visibility into execution
 Heatmap to identify
bottlenecks
• Schema on Read
• Write to File
• Built-in and custom Extractors
and Outputters
• ADL Storage and Azure Blob
Storage
“Unstructured” Files
EXTRACT Expression
@s = EXTRACT a string, b int
FROM "filepath/file.csv"
USING Extractors.Csv(encoding: Encoding.Unicode);
• Built-in Extractors: Csv, Tsv, Text with lots of options, Parquet
• Custom Extractors: e.g., JSON, XML, etc. (see http://usql.io)
OUTPUT Expression
OUTPUT @s
TO "filepath/file.csv"
USING Outputters.Csv();
• Built-in Outputters: Csv, Tsv, Text, Parquet
• Custom Outputters: e.g., JSON, XML, etc. (see http://usql.io)
Filepath URIs
• Relative URI to default ADL Storage account: "filepath/file.csv"
• Absolute URIs:
• ADLS: "adl://account.azuredatalakestore.net/filepath/file.csv"
• WASB: "wasb://container@account/filepath/file.csv"
• Simple Patterns
• Virtual Columns
• Only on EXTRACT GA for now
• OUTPUT in Private Preview
File Sets
Simple pattern language on filename and path
@pattern string =
"/input/{date:yyyy}/{date:MM}/{date:dd}/{*}.{suffix}";
• Binds two columns date and suffix
• Wildcards the filename
• Limits on number of files and file sizes can be improved with
SET @@FeaturePreviews =
"FileSetV2Dot5:on,InputFileGrouping:on,
AsyncCompilerStoreAccess:on";
(Will become default between now and middle of year)
Virtual columns
EXTRACT name string
, suffix string // virtual column
, date DateTime // virtual column
FROM @pattern
USING Extractors.Csv();
• Refer to virtual columns in predicates to get partition elimination
• Warning gets raised if no partition elimination was found
@rows = SELECT
Domain,
SUM(Clicks) AS TotalClicks
FROM @ClickData
GROUP BY Domain;
Read Read
Partition Partition
Full Agg
Write
Full Agg
Write
Full Agg
Write
Read
Partition
Partial Agg Partial Agg Partial Agg
CNN,
FB,
WH
EXTENT 1 EXTENT 2 EXTENT 3
CNN,
FB,
WH
CNN,
FB,
WH
U-SQL Table Distributed by Domain
Read Read
Full Agg Full Agg
Write Write
Read
Full Agg
Write
FB
EXTENT 1
WH
EXTENT 2
CNN
EXTENT 3
Expensive!
ADLA Account/Catalog
Database
Schema
[1,n]
[1,n]
[0,n]
tables views TVFs
C# Fns C# UDAgg
Clustered
Index
partitions
C#
Assemblies
C# Extractors
Data
Source
C# Reducers
C# Processors
C# Combiners
C# Outputters
Ext. tables
User
objects
Refers toContains Implemented
and named by
Procedures
Creden-
tials
MD
Name
C# Name
C# Applier
Table Types
Legend
Statistics
C# UDTs
Packages
• Naming
• Discovery
• Sharing
• Securing
U-SQL Catalog
Naming
• Default Database and Schema context: master.dbo
• Quote identifiers with []: [my table]
• Stores data in ADL Storage /catalog folder
Discovery
• Visual Studio Server Explorer
• Azure Data Lake Analytics Portal
• SDKs and Azure Powershell commands
• Catalog Views: usql.databases, usql.tables etc.
Sharing
• Within an Azure Data Lake Analytics account
• Across ADLA accounts that share same Azure Active Directory:
• Referencing Assemblies
• Calling TVFs, Procedures and referencing tables and views
• Inserting into tables
Securing
• Secured with AAD principals at catalog and Database level
CREATE TABLE T (col1 int
, col2 string
, col3 SQL.MAP<string,string>
, INDEX idx CLUSTERED (col2 ASC)
PARTITION BY (col1)
DISTRIBUTED BY HASH (driver_id)
);
• Structured Data, built-in Data types only (no UDTs)
• Clustered Index (needs to be specified): row-oriented
• Fine-grained distribution (needs to be specified):
• HASH, DIRECT HASH, RANGE, ROUND ROBIN
• Addressable Partitions (optional)
CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …;
CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…;
CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT);
• Infer the schema from the query
• Still requires index and distribution (does not support partitioning)
Data
Partitioning
Tables
Distribution Scheme When to use?
HASH(keys) Automatic Hash for fast item lookup
DIRECT HASH(id) Exact control of hash bucket value
RANGE(keys) Keeps ranges together
ROUND ROBIN To get equal distribution (if others give skew)
Partitions,
Distributions and
Clusters
TABLE
T ( id …
, C …
, date DateTime, …
, INDEX i
CLUSTERED (id, C)
PARTITIONED BY (date)
DISTRIBUTED BY
HASH(id) INTO 4
)
PARTITION (@date1) PARTITION (@date2) PARTITION (@date3)
HASH DISTRIBUTION 1
HASH DISTRIBUTION 2
HASH DISTRIBUTION 3
HASH DISTRIBUTION 1
HASH DISTRIBUTION 1
HASH DISTRIBUTION 2
HASH DISTRIBUTION 3
HASH DISTRIBUTION 4 HASH DISTRIBUTION 3
C1
C2
C3
C1
C2
C4
C5
C4
C6
C6
C7
C8
C7
C5
C6
C9
C10
C1
C3
/catalog/…/tables/Guid(T)/
Guid(T.p1).ss Guid(T.p2).ss Guid(T.p3).ss
LOGICAL
PHYSICAL
Benefits of Table clustering and distribution
• Faster lookup of data provided by distribution and clustering when right
distribution/cluster is chosen
• Data distribution provides better localized scale out
• Used for filters, joins and grouping
Benefits of Table partitioning
• Provides data life cycle management (“expire” old partitions):
Partition on date/time dimension
• Partial re-computation of data at partition level
• Query predicates can provide partition elimination
Do not use when…
• No filters, joins and grouping
• No reuse of the data for future queries
If in doubt: use sampling (e.g., SAMPLE ANY(x)) and test.
Benefits of
Distribution in
Tables
Benefits
• Design for most frequent/costly queries
• Manage data skew in partition/table
• Manage parallelism in querying (by number of
distributions)
• Manage minimizing data movement in joins
• Provide distribution seeks and range scans for query
predicates (distribution bucket elimination)
Distribution in tables is mandatory, chose according to
desired benefits
Benefits of
Clustered Index
in Distribution
Benefits
• Design for most frequent/costly queries
• Manage data skew in distribution bucket
• Provide locality of same data values
• Provide seeks and range scans for query predicates (index
lookup)
Clustered index in tables is mandatory, chose according to
desired benefits
Pro Tip:
Distribution keys should be prefix of Clustered Index keys:
Especially for RANGE distribution
Optimizer will make use of global ordering then:
If you make the RANGE distribution key a prefix of the index key, U-SQL
will repartition on demand to align any UNIONALLed or JOINed tables or
partitions!
Split points of table distribution partitions are chosen independently, so
any partitioned table can do UNION ALL in this manner if the data is to
be processed subsequently on the distribution key.
Benefits of
Partitioned Tables
Benefits
• Partitions are addressable
• Enables finer-grained data lifecycle management at
partition level
• Manage parallelism in querying by number of partitions
• Query predicates provide partition elimination
• Predicate has to be constant-foldable
Use partitioned tables for
• Managing large amounts of incrementally growing
structured data
• Queries with strong locality predicates
• point in time, for specific market etc
• Managing windows of data
• provide data for last x months for processing
Partitioned
tables
 Use partitioned tables
for querying parts of
large amounts of
incrementally growing
structured data
 Get partition
elimination
optimizations with the
right query predicates
Creating partition table
CREATE TABLE PartTable(id int, event_date DateTime, lat float, long float
, INDEX idx CLUSTERED (vehicle_id ASC)
PARTITIONED BY(event_date) DISTRIBUTED BY HASH (vehicle_id) INTO 4);
Creating partitions
DECLARE @pdate1 DateTime = new DateTime(2014, 9, 14, 00,00,00,00,DateTimeKind.Utc);
DECLARE @pdate2 DateTime = new DateTime(2014, 9, 15, 00,00,00,00,DateTimeKind.Utc);
ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@pdate2);
Loading data into partitions dynamically
DECLARE @date1 DateTime = DateTime.Parse("2014-09-14");
DECLARE @date2 DateTime = DateTime.Parse("2014-09-16");
INSERT INTO vehiclesP ON INTEGRITY VIOLATION IGNORE
SELECT vehicle_id, event_date, lat, long FROM @data
WHERE event_date >= @date1 AND event_date <= @date2;
• Filters and inserts clean data only, ignore “dirty” data
Loading data into partitions statically
ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@baddate);
INSERT INTO vehiclesP ON INTEGRITY VIOLATION MOVE TO @baddate
SELECT vehicle_id, lat, long FROM @data
WHERE event_date >= @date1 AND event_date <= @date2;
• Filters and inserts clean data only, put “dirty” data into special partition
What is Table Fragmentation
• ADLS is an append-only store!
• Every INSERT statement is creating a new file (INSERT fragment)
Why is it bad?
• Every INSERT fragment contains data in its own distribution buckets, thus
query processing loses ability to get “localized” fast access
• Query generation has to read from many files now -> slow preparation
phase that may time out.
• Reading from too many files is disallowed:
Current LIMIT: 3000 table partitions and INSERT fragments per job!
What if I have to add data incrementally?
• Batch inserts into table
• Use ALTER TABLE REBUILD/ALTER TABLE REBUILD PARTITION regularly
to reduce fragmentation and keep performance.
Dips down to 1 active vertex at
these times
High-level
Roadmap
• Worldwide Region Availability (currently US and EU)
• Interactive Access with T-SQL query
• Scale out your custom code in the language of choice
(.Net, Java, Python, etc)
• Process the data formats of your choice (incl. Parquet,
ORC; larger string values)
• Continued ADF, AAS, ADC, SQL DW, EventHub, SSIS
integration
• Administrative policies to control usage/cost for storage
& compute
• Secure data sharing between common AAD and public
read-only sharing, fine grained ACLing
• Intense focus on developer productivity for authoring,
debugging, and optimization
• General customer feedback
http://aka.ms/adlfeedback
Resources http://usql.io
http://blogs.msdn.microsoft.com/azuredatalake/
http://blogs.msdn.microsoft.com/mrys/
https://channel9.msdn.com/Search?term=U-SQL#ch9Search
http://aka.ms/usql_reference
https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-
sql-programmability-guide
https://docs.microsoft.com/en-us/azure/data-lake-analytics/
https://msdn.microsoft.com/en-us/magazine/mt614251
https://msdn.microsoft.com/magazine/mt790200
http://www.slideshare.net/MichaelRys
Getting Started with R in U-SQL
https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-
sql-python-extensions
https://social.msdn.microsoft.com/Forums/azure/en-
US/home?forum=AzureDataLake
http://stackoverflow.com/questions/tagged/u-sql
http://aka.ms/adlfeedback
Continue your education at
Microsoft Virtual Academy
online.
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)

Contenu connexe

Tendances

Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLMichael Rys
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Michael Rys
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Jason L Brugger
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)Michael Rys
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sqlaftab alam
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)Michael Rys
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLMichael Rys
 
Discardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopDiscardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopJulian Hyde
 
Introduction to Spark SQL training workshop
Introduction to Spark SQL training workshopIntroduction to Spark SQL training workshop
Introduction to Spark SQL training workshop(Susan) Xinh Huynh
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)Michael Rys
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)Michael Rys
 
Spark SQL with Scala Code Examples
Spark SQL with Scala Code ExamplesSpark SQL with Scala Code Examples
Spark SQL with Scala Code ExamplesTodd McGrath
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)Michael Rys
 

Tendances (20)

Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
 
Discardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopDiscardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With Hadoop
 
Introduction to Spark SQL training workshop
Introduction to Spark SQL training workshopIntroduction to Spark SQL training workshop
Introduction to Spark SQL training workshop
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Spark SQL with Scala Code Examples
Spark SQL with Scala Code ExamplesSpark SQL with Scala Code Examples
Spark SQL with Scala Code Examples
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
 

Similaire à Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)

USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxthando80
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQLMichael Rys
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)Amazon Web Services Korea
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenMS Cloud Summit
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeRick van den Bosch
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeIke Ellis
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Trivadis
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...PROIDEA
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudCAMMS
 

Similaire à Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day) (20)

USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overview
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data Landscape
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
 
CC -Unit4.pptx
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptx
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in Cloud
 
ETL 2.0 Data Engineering for developers
ETL 2.0 Data Engineering for developersETL 2.0 Data Engineering for developers
ETL 2.0 Data Engineering for developers
 

Plus de Michael Rys

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)Michael Rys
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)Michael Rys
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)Michael Rys
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)Michael Rys
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Michael Rys
 

Plus de Michael Rys (12)

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
 

Dernier

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Dernier (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Training Day)

  • 1.
  • 2.
  • 3.
  • 4.
  • 5. 5 Data sourcesNon-relational data DESIGNED FOR THE QUESTIONS YOU KNOW!
  • 6. The Data Lake Approach Ingest all data regardless of requirements Store all data in native format without schema definition Do analysis Hadoop, Spark, R, Azure Data Lake Analytics (ADLA) Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics Devices
  • 7. Microsoft’s Big Data Journey We needed to better leverage data and analytics to do more experimentation So, we built a Data Lake for Microsoft: • A data lake for everyone to put their data • Tools approachable by any developer • Batch, Interactive, Streaming, ML By the numbers • Exabytes of data under management • 100Ks of Physical Servers • 100Ks of Batch Jobs, Millions of Interactive Queries • Huge Streaming Pipelines • 10K+ Developers running diverse workloads and scenarios 2010 2013 2017 Windows SMSG Live Bing CRM/Dynamics Xbox Live Office365 Malware Protection Microsoft Stores Commerce Risk Skype LCA Exchange Yammer Data Stored
  • 8. Culture Changes Engineering How is the system performing? What is the experience my customers are having? How does that correlate to other actions? Is my feature successful ? Marketing What can we observe from our customers to increase revenues? Management How do I drive my business based on the data? Field Where are there new opportunities? How can I connect with my customers more deeply? Support How does this customer’s experience compare with others?
  • 9. HDFS Compatible REST API ADL Store .NET, SQL, Python, R scaled out by U-SQL ADL Analytics Open Source Apache Hadoop ADL Client Azure Databricks HDInsight Hive • Performance at scale • Optimized for analytics • Multiple analytics engines • Single repository sharing
  • 10. HDFS Compatible REST API ADL Store Storage • Architected and built for very high throughput at scale for Big Data workloads • No limits to file size, account size or number of files • Single-repository for sharing • Cloud-scale distributed filesystem with file/folder ACLS and RBAC • Encryption-at-rest by default with Azure Key Vault • Authenticated access with Azure Active Directory integration • Formal Certifications incl. ISO, SOC, PCI, HIPAA
  • 11. HDFS Compatible REST API ADL Store Analytics Storage Cloudera CDH Hortonworks HDP Qubole QDS • Open Source Apache® ADL client for commercial and custom Hadoop • Cloud IaaS and Hybrid
  • 12. Best of Databricks Best of Microsoft Designed in collaboration with the founders of Apache Spark One-click set up; streamlined workflows Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage) Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs) A Z U R E D ATA B R I C K S A F A S T , E A S Y , A N D C O L L A B O R A T I V E A P A C H E S P A R K B A S E D A N A L Y T I C S P L A T F O R M
  • 13. HDFS Compatible REST API HDInsight ADL Store Hive Analytics Storage • 63% lower TCO than on-premise* • SLA- managed, monitored and supported by Microsoft • Fully managed Hadoop, Spark and R • Clusters deployed in minutes *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
  • 14. HDFS Compatible REST API ADL Store .NET, SQL, Python, R scaled out by U-SQL ADL Analytics• Serverless. Pay per job. Starts in seconds. Scales instantly. • Develop massively parallel programs with simplicity • Federated query from multiple data sources
  • 15.
  • 16.
  • 17.
  • 18. Ingress • Event Hubs • IoT Hub • Kafka Analytics • Stream Analytics • Spark Streaming • Storm Sinks • Data Lake Store • Blob Store • SQL Database • SQL Data Warehouse • Event Hub • Power BI • Table Storage • Service Bus Queues • Service Bus Topics • Cosmos DB • Azure Functions • …..
  • 19.
  • 20.
  • 21.
  • 22. Azure Data Lake Store 1Create small files 2Copy small files 3Concat + copy file 4ASA 5Event Hub Capture
  • 23. • Copy • SDK • Tools (Storage Explorer, Visual Studio, 3rd Party) • Data Factory • SQL Integration Services • Streaming from external sources • Generated by cloud analytics
  • 24.
  • 25.
  • 26. Scales out your custom code in .NET, Python, R over your Data Lake Familiar syntax to millions of SQL & .NET developers Unifies • Declarative nature of SQL with the imperative power of your language of choice (e.g., C#, Python) • Processing of structured, semi-structured and unstructured data • Querying multiple Azure Data Sources (Federated Query) U-SQL A framework for Big Data
  • 27. Develop massively parallel programs with simplicity A simple U-SQL script can scale from Gigabytes to Petabytes without learning complex big data programming techniques. U-SQL automatically generates a scaled out and optimized execution plan to handle any amount of data. Execution nodes immediately rapidly allocated to run the program. Error handling, network issues, and runtime optimization are handled automatically. @searchlog = EXTRACT UserId int, Start DateTime, Region string, Query string, Duration int, Urls string, ClickedUrls string FROM @"/Samples/Data/SearchLog.tsv" USING Extractors.Tsv(); OUTPUT @searchlog TO @"/Samples/Output/SearchLog_output.tsv" USING Outputters.Tsv();
  • 28.
  • 29.  Automatic "in-lining" optimized out-of-the- box  Per job parallelization visibility into execution  Heatmap to identify bottlenecks
  • 30. • Schema on Read • Write to File • Built-in and custom Extractors and Outputters • ADL Storage and Azure Blob Storage “Unstructured” Files EXTRACT Expression @s = EXTRACT a string, b int FROM "filepath/file.csv" USING Extractors.Csv(encoding: Encoding.Unicode); • Built-in Extractors: Csv, Tsv, Text with lots of options, Parquet • Custom Extractors: e.g., JSON, XML, etc. (see http://usql.io) OUTPUT Expression OUTPUT @s TO "filepath/file.csv" USING Outputters.Csv(); • Built-in Outputters: Csv, Tsv, Text, Parquet • Custom Outputters: e.g., JSON, XML, etc. (see http://usql.io) Filepath URIs • Relative URI to default ADL Storage account: "filepath/file.csv" • Absolute URIs: • ADLS: "adl://account.azuredatalakestore.net/filepath/file.csv" • WASB: "wasb://container@account/filepath/file.csv"
  • 31. • Simple Patterns • Virtual Columns • Only on EXTRACT GA for now • OUTPUT in Private Preview File Sets Simple pattern language on filename and path @pattern string = "/input/{date:yyyy}/{date:MM}/{date:dd}/{*}.{suffix}"; • Binds two columns date and suffix • Wildcards the filename • Limits on number of files and file sizes can be improved with SET @@FeaturePreviews = "FileSetV2Dot5:on,InputFileGrouping:on, AsyncCompilerStoreAccess:on"; (Will become default between now and middle of year) Virtual columns EXTRACT name string , suffix string // virtual column , date DateTime // virtual column FROM @pattern USING Extractors.Csv(); • Refer to virtual columns in predicates to get partition elimination • Warning gets raised if no partition elimination was found
  • 32.
  • 33.
  • 34.
  • 35. @rows = SELECT Domain, SUM(Clicks) AS TotalClicks FROM @ClickData GROUP BY Domain;
  • 36. Read Read Partition Partition Full Agg Write Full Agg Write Full Agg Write Read Partition Partial Agg Partial Agg Partial Agg CNN, FB, WH EXTENT 1 EXTENT 2 EXTENT 3 CNN, FB, WH CNN, FB, WH U-SQL Table Distributed by Domain Read Read Full Agg Full Agg Write Write Read Full Agg Write FB EXTENT 1 WH EXTENT 2 CNN EXTENT 3 Expensive!
  • 37. ADLA Account/Catalog Database Schema [1,n] [1,n] [0,n] tables views TVFs C# Fns C# UDAgg Clustered Index partitions C# Assemblies C# Extractors Data Source C# Reducers C# Processors C# Combiners C# Outputters Ext. tables User objects Refers toContains Implemented and named by Procedures Creden- tials MD Name C# Name C# Applier Table Types Legend Statistics C# UDTs Packages
  • 38. • Naming • Discovery • Sharing • Securing U-SQL Catalog Naming • Default Database and Schema context: master.dbo • Quote identifiers with []: [my table] • Stores data in ADL Storage /catalog folder Discovery • Visual Studio Server Explorer • Azure Data Lake Analytics Portal • SDKs and Azure Powershell commands • Catalog Views: usql.databases, usql.tables etc. Sharing • Within an Azure Data Lake Analytics account • Across ADLA accounts that share same Azure Active Directory: • Referencing Assemblies • Calling TVFs, Procedures and referencing tables and views • Inserting into tables Securing • Secured with AAD principals at catalog and Database level
  • 39. CREATE TABLE T (col1 int , col2 string , col3 SQL.MAP<string,string> , INDEX idx CLUSTERED (col2 ASC) PARTITION BY (col1) DISTRIBUTED BY HASH (driver_id) ); • Structured Data, built-in Data types only (no UDTs) • Clustered Index (needs to be specified): row-oriented • Fine-grained distribution (needs to be specified): • HASH, DIRECT HASH, RANGE, ROUND ROBIN • Addressable Partitions (optional) CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …; CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…; CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT); • Infer the schema from the query • Still requires index and distribution (does not support partitioning)
  • 40. Data Partitioning Tables Distribution Scheme When to use? HASH(keys) Automatic Hash for fast item lookup DIRECT HASH(id) Exact control of hash bucket value RANGE(keys) Keeps ranges together ROUND ROBIN To get equal distribution (if others give skew)
  • 41. Partitions, Distributions and Clusters TABLE T ( id … , C … , date DateTime, … , INDEX i CLUSTERED (id, C) PARTITIONED BY (date) DISTRIBUTED BY HASH(id) INTO 4 ) PARTITION (@date1) PARTITION (@date2) PARTITION (@date3) HASH DISTRIBUTION 1 HASH DISTRIBUTION 2 HASH DISTRIBUTION 3 HASH DISTRIBUTION 1 HASH DISTRIBUTION 1 HASH DISTRIBUTION 2 HASH DISTRIBUTION 3 HASH DISTRIBUTION 4 HASH DISTRIBUTION 3 C1 C2 C3 C1 C2 C4 C5 C4 C6 C6 C7 C8 C7 C5 C6 C9 C10 C1 C3 /catalog/…/tables/Guid(T)/ Guid(T.p1).ss Guid(T.p2).ss Guid(T.p3).ss LOGICAL PHYSICAL
  • 42. Benefits of Table clustering and distribution • Faster lookup of data provided by distribution and clustering when right distribution/cluster is chosen • Data distribution provides better localized scale out • Used for filters, joins and grouping Benefits of Table partitioning • Provides data life cycle management (“expire” old partitions): Partition on date/time dimension • Partial re-computation of data at partition level • Query predicates can provide partition elimination Do not use when… • No filters, joins and grouping • No reuse of the data for future queries If in doubt: use sampling (e.g., SAMPLE ANY(x)) and test.
  • 43. Benefits of Distribution in Tables Benefits • Design for most frequent/costly queries • Manage data skew in partition/table • Manage parallelism in querying (by number of distributions) • Manage minimizing data movement in joins • Provide distribution seeks and range scans for query predicates (distribution bucket elimination) Distribution in tables is mandatory, chose according to desired benefits
  • 44. Benefits of Clustered Index in Distribution Benefits • Design for most frequent/costly queries • Manage data skew in distribution bucket • Provide locality of same data values • Provide seeks and range scans for query predicates (index lookup) Clustered index in tables is mandatory, chose according to desired benefits Pro Tip: Distribution keys should be prefix of Clustered Index keys: Especially for RANGE distribution Optimizer will make use of global ordering then: If you make the RANGE distribution key a prefix of the index key, U-SQL will repartition on demand to align any UNIONALLed or JOINed tables or partitions! Split points of table distribution partitions are chosen independently, so any partitioned table can do UNION ALL in this manner if the data is to be processed subsequently on the distribution key.
  • 45. Benefits of Partitioned Tables Benefits • Partitions are addressable • Enables finer-grained data lifecycle management at partition level • Manage parallelism in querying by number of partitions • Query predicates provide partition elimination • Predicate has to be constant-foldable Use partitioned tables for • Managing large amounts of incrementally growing structured data • Queries with strong locality predicates • point in time, for specific market etc • Managing windows of data • provide data for last x months for processing
  • 46. Partitioned tables  Use partitioned tables for querying parts of large amounts of incrementally growing structured data  Get partition elimination optimizations with the right query predicates Creating partition table CREATE TABLE PartTable(id int, event_date DateTime, lat float, long float , INDEX idx CLUSTERED (vehicle_id ASC) PARTITIONED BY(event_date) DISTRIBUTED BY HASH (vehicle_id) INTO 4); Creating partitions DECLARE @pdate1 DateTime = new DateTime(2014, 9, 14, 00,00,00,00,DateTimeKind.Utc); DECLARE @pdate2 DateTime = new DateTime(2014, 9, 15, 00,00,00,00,DateTimeKind.Utc); ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@pdate2); Loading data into partitions dynamically DECLARE @date1 DateTime = DateTime.Parse("2014-09-14"); DECLARE @date2 DateTime = DateTime.Parse("2014-09-16"); INSERT INTO vehiclesP ON INTEGRITY VIOLATION IGNORE SELECT vehicle_id, event_date, lat, long FROM @data WHERE event_date >= @date1 AND event_date <= @date2; • Filters and inserts clean data only, ignore “dirty” data Loading data into partitions statically ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@baddate); INSERT INTO vehiclesP ON INTEGRITY VIOLATION MOVE TO @baddate SELECT vehicle_id, lat, long FROM @data WHERE event_date >= @date1 AND event_date <= @date2; • Filters and inserts clean data only, put “dirty” data into special partition
  • 47. What is Table Fragmentation • ADLS is an append-only store! • Every INSERT statement is creating a new file (INSERT fragment) Why is it bad? • Every INSERT fragment contains data in its own distribution buckets, thus query processing loses ability to get “localized” fast access • Query generation has to read from many files now -> slow preparation phase that may time out. • Reading from too many files is disallowed: Current LIMIT: 3000 table partitions and INSERT fragments per job! What if I have to add data incrementally? • Batch inserts into table • Use ALTER TABLE REBUILD/ALTER TABLE REBUILD PARTITION regularly to reduce fragmentation and keep performance.
  • 48.
  • 49. Dips down to 1 active vertex at these times
  • 50.
  • 51.
  • 52.
  • 53. High-level Roadmap • Worldwide Region Availability (currently US and EU) • Interactive Access with T-SQL query • Scale out your custom code in the language of choice (.Net, Java, Python, etc) • Process the data formats of your choice (incl. Parquet, ORC; larger string values) • Continued ADF, AAS, ADC, SQL DW, EventHub, SSIS integration • Administrative policies to control usage/cost for storage & compute • Secure data sharing between common AAD and public read-only sharing, fine grained ACLing • Intense focus on developer productivity for authoring, debugging, and optimization • General customer feedback http://aka.ms/adlfeedback
  • 54.
  • 55. Resources http://usql.io http://blogs.msdn.microsoft.com/azuredatalake/ http://blogs.msdn.microsoft.com/mrys/ https://channel9.msdn.com/Search?term=U-SQL#ch9Search http://aka.ms/usql_reference https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u- sql-programmability-guide https://docs.microsoft.com/en-us/azure/data-lake-analytics/ https://msdn.microsoft.com/en-us/magazine/mt614251 https://msdn.microsoft.com/magazine/mt790200 http://www.slideshare.net/MichaelRys Getting Started with R in U-SQL https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u- sql-python-extensions https://social.msdn.microsoft.com/Forums/azure/en- US/home?forum=AzureDataLake http://stackoverflow.com/questions/tagged/u-sql http://aka.ms/adlfeedback Continue your education at Microsoft Virtual Academy online.

Notes de l'éditeur

  1. IMPORTANT NOTE: Microsoft Ready will utilize Windows 10 and the new PowerPoint 2016, as well as PowerPoint 2013 on all event machines. Please build your slides utilizing the appropriate Template and utilize the version of PowerPoint that works best.   Windows 10 devices now connect remotely through VPN using the PIN with Passport for Work. Speakers needing to use VPN in a demo, will need to have registered for Phone Authentication at https://phoneregistration.microsoft.com. Additional details are also available here: https://microsoft.sharepoint.com/sites/itweb/securelogon/Pages/FAQ.aspx
  2. This slide is required. Do NOT delete. This should be the first slide after your Title Slide. If you have questions, please contact your Track PM for guidance. We have also posted guidance on writing good objectives, out on the Speaker Portal (https://www.microsoftready.com).   Please Note: Key Takeaways are not required for Group Discussions (formerly known as Chalk Talks) or Workshop sessions, as takeaways will be developed real time by the facilitator who should summarize the key points made by them/the audience during the discussion or workshop. This slide should introduce the session by identifying how this information helps the attendee, partners and customers be more successful. Why is this content important? This slide should call out what’s important about the session (sort of the why should we care, why is this important and how will it help our customers/partners be successful) as well as the key takeaways/objectives associated with the session. Call out what attendees will be able to execute on using the information gained in this session. What will they be able to walk away from this session and execute on with their customers. Good Objectives should be SMART (specific, measurable, achievable, realistic, time-bound). Focus on the key takeaways and why this information is important to the attendee, our partners and our customers. Each session has objectives defined and published on www.microsoftready.com, please work with your Track PM to call these out here in the slide deck. If you have questions, please contact your Track PM.
  3. Shows simple Extract, OUTPUT on preview large file and many small files to introduce fast file set Then simple extensibility with string functions Dynamic output
  4. Par
  5. Show Views, TVFs and Tables
  6. https://github.com/Azure/usql/tree/master/Examples/AmbulanceDemos/AmbulanceDemos/5-Ambulance-StreamSets-PartitionedTables
  7. This slide is recommended as a final slide to recap the objectives of the session to remind attendees what you said would be covered and to highlight that you did indeed cover those points. LEARNING OBJECTIVES: Match the objectives covered on the required Objective slide at the beginning of your presentation. KEY TAKEAWAYS: Bullet points highlighting the primary information sellers should be able to recall from the session to help them perform in role (e.g. practical guidance, tips, suggested behavior changes.) Please Note: Key Takeaways are not required to be noted in this slide for Group Discussions (formerly known as Chalk Talks) or Workshop sessions; however, takeaways should be captured real time by the facilitator during the session and those key points should be emphasized during the discussion or workshop. ACTION ITEMS: Next steps to put their learnings into action. If you have questions, please contact your Track PM.
  8. Display this slide during session Q&A and direct attendees to use the Q&A microphone located in the session room: Digital Ready session recordings cannot capture Q&A unless it is spoken using the microphone Attendees in the back of the room may not be able to hear a question from someone in the front of the room SPEAKERS MUST REPEAT THE QUESTIONS IF THE ATTENDEE IS NOT USING THE Q&A MICROPHONE