Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Azure SQL DWH

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB
Chargement dans…3
×

Consultez-les par la suite

1 sur 51 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (20)

Publicité

Similaire à Azure SQL DWH (20)

Plus récents (20)

Publicité

Azure SQL DWH

  1. 1. Azure SQL DWH Big data-as-a-service by
  2. 2. About me 4/4/2016 Azure SQL DWH  Shy Engelberg, CTO @  Email : Shy@Valinor.co.il  Phone : 054-771-711-5  Twitter : @ShyEngelberg 2 |
  3. 3. Agenda 4/4/2016 Azure SQL DWH  SQL DWH introduction  Architecture  Creating a DWH  Loading data  Tools  Summary 3 |
  4. 4. Objectives 4/4/2016 Azure SQL DWH  Know what Azure SQL Data warehouse is  Know how Azure SQL Data warehouse works  Know how to create and connect to Azure SQL Data warehouse  Know the basics tools and methods to get started with developing  Identify scenarios that this solution might suit. 4 |
  5. 5. The data warehouse fairytale  Once upon a time, data warehouse was an appliance who required fixed combinations of storage and compute, often underutilizing expensive resources. Meaning monstrous hardware was lying unused.
  6. 6. The data warehouse fairytale  Once upon a time, data warehouse was an appliance who required fixed combinations of storage and compute, often underutilizing expensive resources. Meaning monstrous hardware was lying unused.
  7. 7. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH7 |  an enterprise-class, distributed database capable of processing massive volumes of relational and non-relational data.  It is the industry's first cloud data warehouse that combines proven SQL capabilities with the ability to grow, shrink, and pause in seconds.  Now on GA! (July 14th)
  8. 8. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH8 |  an enterprise-class, capable of of and data.  It is the industry's first data warehouse that combines proven with the ability to in seconds.
  9. 9. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH9 | – Azure PaaS – an MPP – up to PBs – a relational DB that can query also non-relational data – based on the product we know and love – use what you need, when you need it.
  10. 10. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH10 |  Easily deploys in seconds.  Pay for query performance only when you need it (or you can pause it completely)  Fully managed service, removes the hassle of software patching, maintenance, back-ups.
  11. 11. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH11 | SQL Data Warehouse uses Microsoft’s massively parallel processing (MPP) architecture. You pay for time-to-insight, not hardware. (details are a few slides ahead)
  12. 12. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH12 | Using PolyBase, leverage Transact-SQL to query seamlessly across both relational data in a relational database and non-relational data in common Hadoop formats.
  13. 13. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH13 | SQL Data Warehouse is based on the proven relational database engine of SQL Server and includes the features you expect, including stored procedures, UDF’s, partitioning, indexes, and collations. If you already know Transact-SQL, its easy to transfer your knowledge to SQL Data Warehouse.
  14. 14. What is Azure SQL Data warehouse 4/4/2016 Azure SQL DWH14 | You can grow or shrink compute power in minutes. Take full advantage of storage at cloud scale, and apply query compute based on changing performance needs. When compute is paused, you pay only for storage.
  15. 15. Architecture 4/4/2016 Azure SQL DWH15 |  At its core, SQL Data Warehouse uses Microsoft’s massive parallel processing (MPP) architecture, originally designed to run some of the largest on-premises enterprise data warehouses.
  16. 16. Architecture 4/4/2016 Azure SQL DWH16 |  At its core, SQL Data Warehouse uses Microsoft’s architecture, originally designed to run some of the largest on-premises enterprise data warehouses.
  17. 17. Architecture – MPP The coordinated processing of a program by multiple processors working on different parts of the program. Each processor has its own operating system and memory.
  18. 18. The MPP way Mission- process a lot of data The SMP way
  19. 19. Scale for better performance The SMP way The MPP way
  20. 20. Architecture – MPP
  21. 21. Architecture – MPP  Breaks a large queries across nodes for simultaneous processing.  Every node is “working” on a local subset of the data.  Capable of higher data ingestion rates through parallelization.  Scale horizontally by adding nodes, rather than moving to a server with more CPUs or higher storage capacity.  Unlike SMP – there is no single bottleneck.
  22. 22. Architecture – Azure SQL Data warehouse  SQL Data Warehouse independently scales compute and storage.  This concept is what allows us the ability to pause compute, scale performance in seconds, and pay only for the performance we need.
  23. 23. Architecture – Azure SQL Data warehouse  SQL Data Warehouse .  This concept is what allows us the ability to , and we need.
  24. 24. Architecture – Azure SQL Data warehouse Data management service Control node (MPP engine) Control DBs TempDBMaster SQL Server Azure blob storage Data management service Compute node 2 User Data SQL Server Data management service Compute node 1 User Data SQL Server
  25. 25. Architecture – Azure SQL Data warehouse Data management service Control node (MPP engine) Control DBs TempDBMaster SQL Server • “Controls" the system. • It is the front end that interacts with all applications and connections. • powered by SQL Database, and connecting to it looks and feels the same. • Under the surface, the Control node coordinates all of the data movement and computation required to run parallel queries on your distributed data. • When you submit a TSQL query to SQL Data Warehouse, the Control node transforms it into separate queries that will run on each Compute node in parallel.
  26. 26. Architecture – Azure SQL Data warehouse Data management service Control node (MPP engine) Control DBs TempDBMaster SQL Server Azure blob storage Data management service Compute node 2 User Data SQL Server Data management service Compute node 1 User Data SQL Server
  27. 27. Architecture – Azure SQL Data warehouse User Data SQL Server Data management service Compute node 1 User Data SQL Server  SQL Databases which process your query steps and manage your data.  The Compute nodes are the workers that run the parallel queries on your data.  After processing, they pass the results back to the Control node. To finish the query, the Control node aggregates the results and returns the final result.
  28. 28. Architecture – Azure SQL Data warehouse Data management service Control node (MPP engine) Control DBs TempDBMaster SQL Server Azure blob storage Data management service Compute node 2 User Data SQL Server Data management service Compute node 1 User Data SQL Server
  29. 29. Architecture – Azure SQL Data warehouse  Data Movement Service (DMS) is our technology for moving data between the nodes.  DMS gives the Compute nodes access to data they need for joins and aggregations.  DMS is not an Azure service. It is a Windows service that runs alongside SQL Database on all the nodes. Data management service
  30. 30. Architecture – Azure SQL Data warehouse Data management service Control node (MPP engine) Control DBs TempDBMaster SQL Server Azure blob storage Data management service Compute node 2 User Data SQL Server Data management service Compute node 1 User Data SQL Server
  31. 31. Architecture – Azure SQL Data warehouse Azure blob storage  Data is stored in Azure Storage Blobs.  When Compute nodes interact with data, they write and read directly to and from blob storage.  Since Azure storage expands transparently and limitlessly, SQL Data Warehouse can do the same.  Since compute and storage are independent, SQL Data Warehouse can automatically scale storage separately from scaling compute, and vice-versa. Azure Storage is fully fault tolerant.
  32. 32. Architecture – scaling  Since each compute node only works on a subset of the data, if we want to scale, all we need to do is add more compute nodes.  The “Magic” is that we can add more compute nodes without moving (redistributing) the data.  The scaling takes only a couple of minutes (initializing the compute node)
  33. 33. Architecture – scaling  Changing the amount of compute is as simple as moving a slider to the left or right, but can also be scheduled using T-SQL or PShell.  Compute usage in SQL Data Warehouse is measured using SQL Data Warehouse Units (DWUs).
  34. 34. Architecture – data distribution  All tables are distributed.  Each distribution is like a bucket; storing a unique subset of the data.  For now, SQL DW has 60 distributions. Each table is divided into 60 different distributions, from the moment it’s created. When there’s only one compute node, it holds all distributions, when there’s more, the distributions are spread among them.
  35. 35. Architecture – finally  Bring all the data you want, pay only for the storage.  If you want to query your data (dahhh), pay only for the compute you need, when you need.  Classic MPP design, scaling is almost linear.  We don’t really need to know how many nodes or distributes are there under the cover – it’s a PaaS, we are guaranteed a certain amount of performance.
  36. 36. Creating a DWH  Creating is simple as 1,2,3…  DWH is defined in a “Server” just like Azure SQL database.  Pause and scale are a button away. DEMO
  37. 37. Creating a table – distribution  When a table is created, it is spread across all of the distributions.  We need to choose how to distribute the data:  Hash  Round-robin (evenly but randomly, default behavior)
  38. 38. Creating a table – type  The default behavior is that a table is created with a clustered column store index. (which makes Azure SQL DWH, a columnar database)  Choose, what type of table during creation:
  39. 39. Creating a table – statistics  Statistics are not created automatically, we have to create them ourselves!  Statistics are not updated automatically!
  40. 40. Connecting and creating a table DEMO  Add firewall rule  Connect using SSMS  Create a table  Create statistics
  41. 41. Loading data  SQL Data Warehouse supports many loading methods:  SSIS  BCP  SQLBulkCopy API  Azure Data Factory PolyBase The “Push” methods – a query that goes through the “control node”, which becomes a bottleneck. (single-client gated) by far the fastest and most scalable SQL Data Warehouse loading method to date
  42. 42. Loading data – PolyBase  PolyBase is a scalable, query processing framework compatible with Transact-SQL that can be used to combine and bridge data across relational database management systems and Azure Blob Storage.  Currently PolyBase can load data from UTF-8 encoded delimited text files as well as the popular Hadoop file formats RC File, ORC, and Parquet. PolyBase can load data from gzip, zlib and Snappy compressed files.
  43. 43.  A “Pull” method.  Every compute node, has an HDFS bridge in the DMS service. Every bridge can parallel connect to external resources. Loading data – PolyBase
  44. 44.  PolyBase data loading is not limited by the Control node, and so as you scale out your DWU, your data transfer throughput also increases.  A recommended way of loading data: 1. write your source to CSV files 2. put the files in Azure Blob Storage 3. Load using PolyBase Loading data – PolyBase
  45. 45. Loading data – PolyBase DEMO  Query using PolyBase  CTAS to load data using PolyBase
  46. 46. This is SQL Server inside – you know it all.  Uses almost the same T-SQL.  Supports views, stored procedures, partitions and many other known and loved features.  Built-in HADR (it’s a PaaS, remember?)  Out-of-the-box backup and restore service. Azure SQL Data warehouse – features
  47. 47.  SSMS is not yet fully supported  (but very soon) – SSMS is supported (July 14th)!  Visual studio (SSDT) is supported   Integrates easily to Azure ML, PowerBI and Data Factory.  Many 3rd party solutions are available: Azure SQL Data warehouse – tools
  48. 48.  Best used for data processing scenarios, not as a data store all users can query, because:  Concurrency is very limited (like all MPPs)  Use to load data, process it, and move it to the next step or for individual queries that needs massive amount of processing. Azure SQL Data warehouse – usage
  49. 49. Azure SQL Data warehouse – Summary  A relational columnar DWH.  MPP service that allows us to scale compute in separate from storage.  We can pause the compute whenever needed.  Using the infinite power of the cloud, we can process as much data as we want, and use as much power as we want.  We don’t need to buy expensive hardware.
  50. 50. Azure SQL Data warehouse – Summary Best for:  Your data is already in Azure.  You need scheduled computing power for data processing.  You have a lot of data, but don’t want to spend a lot of money.  Not for concurrent querying!
  51. 51. THANK YOU! Questions?

×