22 avril
2017
Loris Andaloro
La BI traditionnelle est une
histoire du passée ?
Impacts de la révolution Cloud Azure
sur la BI data en général
Ihor Leontiev
@LeontievIhor
blog.andaloro.fr
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
2
Meet the Team
@LeontievIhor
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
3
Sommaire
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
4
Par ou commencer ? Car les choix sont
importants
http://azureplatform.azurewebsites.net/en-us/
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
5
Approche pragmatique
Exploration des nouvelles possibilités d’azure
6
Approche pragmatique
Architectures onpremise connues
•Scénario Datawarehouse
•Scenario DataLake
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
7
“Think big, act small, fail fast. Learn
rapidly”ARCHITECTURE EMERGENTE
Elaboration
de la vision
d’avenir
agnostique
Différents
tests pour
l’acquisition
et le stockage
des données
Stabilisation
de
l’architecture
Mise au
propre
PoC
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
8
“Think big, act small, fail fast. Learn
rapidly”Classification des services
Integration Traitement Stockage Presentation
Nettoyage et aggregation
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
9
“Think big, act small, fail fast. Learn
rapidly”Classification des services
Enrichment and Curation
Integration Traitement Stockage Presentation
Event Hubs
IoT Hubs
Service Bus
Kafka
HDInsight
ADLA
Storm
Spark
Stream Analytics
ADLS
Azure Storage
Azure SQL DB
Azure SQL DW
ADLS
Azure DW
Azure SQL DB
Hbase
Cassandra
Azure Storage
Power BI
Azure Data Factory Azure ML
10
Répartitions MS des services dans les
architectures connues
https://blogs.technet.microsoft.com/cansql/2015/06/03/microsoft-data-platform-overview/
11Arbre de décisions
L’arbre de décision
de Ivan Kosyakov
Data Platform Technical Architect at Microsoft
12
Est-ce que la théorie fonctionne en pratique ?
https://pixabay.com/fr/l-homme-personne-visage-glasse-159771/
FocusDatawarehouse
14
Scenario ETL
Passons en revue quelques uns des services qui semblent utiles ou
nécessaires dans ce scenario
Azure SQL Database Power BIAzure Data Catalog
Azure Data FactoryAzure SQL Datawarehouse
15
Azure SQL Database
• Base de données
relationnelle cloud, propulsé
par Microsoft SQL Server
• Ne nécessite pas
d’infrastructure à manager
• Mise à l’échelle instantanée,
tailles jusqu’a 1TB
• Correspond parfaitement à
un usage entrepôt de
données simple
Azure SQL Database
16
Azure SQL Database
Azure SQL Database
Dynamic Data Masking
Transparent Data Encryption (TDE)
17
Azure SQL Database et ses alternatives
Par rapports aux autres services de stockage
Traitement transactionnel
Requêtes riches
Service managé
Mise à l’echelle
Accessible à travers internet http/rest
Modèle de données non relationnel
Souple concernant les formats de donnes
18
Azure SQL Database
0 50 100 150 200 250 300 350 400 450
Table Storage
DocumentDB
SQL DB
Table Storage DocumentDB SQL DB
Standard 0.11 7.13 12.67
Premium 21.08 392
Prix par mois sur la base de 10 GO de données
https://azure.microsoft.com/fr-fr/pricing/calculator/
19
Azure SQL Datawarehouse
App Service Azure SQL Database
Azure Machine Learning
Intelligent App
Hadoop
Azure SQL Data
Warehouse
Power BI
Datawarehouse(relationnel)-as-a-service
Scales to petabytes of data
Massively Parallel Processing
Instant-on compute scales in
seconds
Query Relational / Non-
Relational
20
Azure SQL Datawarehouse
http://www.jamesserra.com/archive/2016/08/azure-sql-database-vs-sql-data-warehouse/
21
Azure SQL Datawarehouse
0 € 10,000 € 20,000 € 30,000 € 40,000 € 50,000 € 60,000 €
SQLDB
SQLDW
SQLDB SQLDW
6000 DWU 57,000 €
4000 DTU 13,495 €
100 DWU 1,062 €
10 DTU 13 €
Prix par mois sur la base de 100 Go
https://azure.microsoft.com/fr-fr/pricing/calculator/
22
Power BI
Power BI
23
Power BI
Modules et échanges
Power BI
24
Power BI
Gartner magic quadrant
Power BI
25
Azure Data Factory
Service cloud d’integration de données (ETL)
https://docs.microsoft.com/fr-fr/azure/data-factory/data-factory-introduction Azure Data Factory
26
Limites ADF conduisant à SSIS
https://docs.microsoft.com/fr-fr/azure/data-factory/data-factory-introduction Azure Data Factory
RETOUR D’EXPERIENCE ET DIFFICULTES
27
Azure Data Catalog
Moteur de
recherche axée
données
Enregistrement
des sources de
données centrales
Décisionnel libre-
service
Capturer les
connaissances
tribales
28
Azure Data Catalog
FocusDatalake
30
Scenario Datalake
Azure Data Lake service
 Store and manage infinite data
 Keep data in its original form
 High through put, low latency analytic
jobs
 Enterprise-grade security + access
control
Data Lake
Data Lake service
Transformative way to store and process infinite data
Other analytic
solutions SQL Data
Warehouse
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
32
34
Blob Storage Concepts
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
35
Azure Files
Shared Network File Storage for Azure
Availability, durability, scalability are managed
automatically
Supports two interfaces: SMB and REST
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
36Azure Files vs Blobs
Description Azure Blobs Azure Files
Durability
Options
LRS, ZRS, GRS (and RA-GRS for higher
availability)
LRS, GRS
Accessibility REST APIs
SMB 2.1 (standard file system APIs)
REST APIs
Connectivity REST – Worldwide
SMB 2.1 - Within region
REST – Worldwide
Endpoints
http://myaccount.blob.core.windows.net/my
container/myblob
myaccount.file.core.windows.netmysharemyfile.txt
http://myaccount.file.core.windows.net/myshare/myfile.txt
Directories
Flat namespace however prefix listing can
simulate virtual directories
True directory objects
Case Sensitivity of Names Case sensitive Case insensitive, but case preserving
Capacity Up to 500TB containers 5TB file shares
Throughput Up to 60 MB/s per blob Up to 60 MB/s per share
Object size Up to 1 TB/blob Up to 1 TB/file
Billed capacity Based on bytes written Based on file size
FocusBigdata
Data size
Access
Updates
Structure
Integrity
Scaling
Relational DB vs. Hadoop
Distributed Storage
(HDFS)
Query
(Hive)
Distributed Processing
(MapReduce)
ODBC
Legend
■ Core Hadoop
■ Data processing
■ Data Movement
■ Packages
Hadoop ecosystem
HDInsight and Hadoop
Hadoop Core +
Hive, Pig, HBase
C#, F#, .NET
Azure Storage (WASB)
Office 365 Power BI (Excel,
PowerQuery, PowerView,
BI Sites)
World's Data (Azure Data Marketplace)
ODBC
Sqoop for SQL
Server
PowerShell
Exemples d’architectures
43
Exemple 1
Azure Virtual Machine
Azure Blob Storage Azure SQL Database Power BI
Azure Data Catalog
Script Power Shell
Azure Storage Explorer
Dépôt manuel
FTP
API Azure Blob Storage
44
Exemple 2
Azure Virtual Machine
Azure SQL Database Power BI
Azure Data Catalog
Azure SQL Database
Azure Data Factory
Base de données
source
45
Exemple 3
Azure Web App (FTP)
Azure Data Factory
Azure Virtual Machine
Azure Table Storage Azure SQL Database Power BI
Azure Data Catalog
46
Exemple 4
47
Exemple 5
48
Exemple 6
49
Axes d’améliorations
• Remplacement SSIS par une solution PaaS
• Ajout de Master Services peut-être dans Data Catalog
• Véritable solution FTP
• Connecteur SSIS standard pour Table Storage
Conclusion
51
Vue d’ensemble
52
Vue d’ensemble
Avec les services Azure
SQL Database
SQL Datawarehouse
Datalake
Storage
SQL Server in an
IaaS VM
55
Conclusion
Subtitle
Q & A
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
56
Merci à nos sponsors
PLATINUM
LOCAUX
PARTENAIRES MEDIA
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
57
Sponsors internationaux
www.azug.fr
© 2017 AZUG FR. All Rights Reserved.
58
Nous suivre
Facebook
facebook.com/groups/azugfr
Twitter
twitter.com/AZUGFR
Meetup
meetup.com/AZUG-FR/
Web
www.azug.fr
Twitter
twitter.com/MugLyon
Web
https://muglyon.github.io
Meetup
meetup.com/MugLyon
Merci
d’être venus
A bientôt !

Gab17 lyon - La BI traditionnelle est une histoire du passée. Impacts de la révolution Cloud Azure sur la BI data en général, by Ihor Leontiev et Loris Andaloro

  • 1.
    22 avril 2017 Loris Andaloro LaBI traditionnelle est une histoire du passée ? Impacts de la révolution Cloud Azure sur la BI data en général Ihor Leontiev @LeontievIhor blog.andaloro.fr
  • 2.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 2 Meet the Team @LeontievIhor
  • 3.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 3 Sommaire
  • 4.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 4 Par ou commencer ? Car les choix sont importants http://azureplatform.azurewebsites.net/en-us/
  • 5.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 5 Approche pragmatique Exploration des nouvelles possibilités d’azure
  • 6.
    6 Approche pragmatique Architectures onpremiseconnues •Scénario Datawarehouse •Scenario DataLake
  • 7.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 7 “Think big, act small, fail fast. Learn rapidly”ARCHITECTURE EMERGENTE Elaboration de la vision d’avenir agnostique Différents tests pour l’acquisition et le stockage des données Stabilisation de l’architecture Mise au propre PoC
  • 8.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 8 “Think big, act small, fail fast. Learn rapidly”Classification des services Integration Traitement Stockage Presentation Nettoyage et aggregation
  • 9.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 9 “Think big, act small, fail fast. Learn rapidly”Classification des services Enrichment and Curation Integration Traitement Stockage Presentation Event Hubs IoT Hubs Service Bus Kafka HDInsight ADLA Storm Spark Stream Analytics ADLS Azure Storage Azure SQL DB Azure SQL DW ADLS Azure DW Azure SQL DB Hbase Cassandra Azure Storage Power BI Azure Data Factory Azure ML
  • 10.
    10 Répartitions MS desservices dans les architectures connues https://blogs.technet.microsoft.com/cansql/2015/06/03/microsoft-data-platform-overview/
  • 11.
    11Arbre de décisions L’arbrede décision de Ivan Kosyakov Data Platform Technical Architect at Microsoft
  • 12.
    12 Est-ce que lathéorie fonctionne en pratique ? https://pixabay.com/fr/l-homme-personne-visage-glasse-159771/
  • 13.
  • 14.
    14 Scenario ETL Passons enrevue quelques uns des services qui semblent utiles ou nécessaires dans ce scenario Azure SQL Database Power BIAzure Data Catalog Azure Data FactoryAzure SQL Datawarehouse
  • 15.
    15 Azure SQL Database •Base de données relationnelle cloud, propulsé par Microsoft SQL Server • Ne nécessite pas d’infrastructure à manager • Mise à l’échelle instantanée, tailles jusqu’a 1TB • Correspond parfaitement à un usage entrepôt de données simple Azure SQL Database
  • 16.
    16 Azure SQL Database AzureSQL Database Dynamic Data Masking Transparent Data Encryption (TDE)
  • 17.
    17 Azure SQL Databaseet ses alternatives Par rapports aux autres services de stockage Traitement transactionnel Requêtes riches Service managé Mise à l’echelle Accessible à travers internet http/rest Modèle de données non relationnel Souple concernant les formats de donnes
  • 18.
    18 Azure SQL Database 050 100 150 200 250 300 350 400 450 Table Storage DocumentDB SQL DB Table Storage DocumentDB SQL DB Standard 0.11 7.13 12.67 Premium 21.08 392 Prix par mois sur la base de 10 GO de données https://azure.microsoft.com/fr-fr/pricing/calculator/
  • 19.
    19 Azure SQL Datawarehouse AppService Azure SQL Database Azure Machine Learning Intelligent App Hadoop Azure SQL Data Warehouse Power BI Datawarehouse(relationnel)-as-a-service Scales to petabytes of data Massively Parallel Processing Instant-on compute scales in seconds Query Relational / Non- Relational
  • 20.
  • 21.
    21 Azure SQL Datawarehouse 0€ 10,000 € 20,000 € 30,000 € 40,000 € 50,000 € 60,000 € SQLDB SQLDW SQLDB SQLDW 6000 DWU 57,000 € 4000 DTU 13,495 € 100 DWU 1,062 € 10 DTU 13 € Prix par mois sur la base de 100 Go https://azure.microsoft.com/fr-fr/pricing/calculator/
  • 22.
  • 23.
    23 Power BI Modules etéchanges Power BI
  • 24.
    24 Power BI Gartner magicquadrant Power BI
  • 25.
    25 Azure Data Factory Servicecloud d’integration de données (ETL) https://docs.microsoft.com/fr-fr/azure/data-factory/data-factory-introduction Azure Data Factory
  • 26.
    26 Limites ADF conduisantà SSIS https://docs.microsoft.com/fr-fr/azure/data-factory/data-factory-introduction Azure Data Factory RETOUR D’EXPERIENCE ET DIFFICULTES
  • 27.
    27 Azure Data Catalog Moteurde recherche axée données Enregistrement des sources de données centrales Décisionnel libre- service Capturer les connaissances tribales
  • 28.
  • 29.
  • 30.
    30 Scenario Datalake Azure DataLake service  Store and manage infinite data  Keep data in its original form  High through put, low latency analytic jobs  Enterprise-grade security + access control Data Lake
  • 31.
    Data Lake service Transformativeway to store and process infinite data Other analytic solutions SQL Data Warehouse
  • 32.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 32
  • 33.
  • 34.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 35 Azure Files Shared Network File Storage for Azure Availability, durability, scalability are managed automatically Supports two interfaces: SMB and REST
  • 35.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 36Azure Files vs Blobs Description Azure Blobs Azure Files Durability Options LRS, ZRS, GRS (and RA-GRS for higher availability) LRS, GRS Accessibility REST APIs SMB 2.1 (standard file system APIs) REST APIs Connectivity REST – Worldwide SMB 2.1 - Within region REST – Worldwide Endpoints http://myaccount.blob.core.windows.net/my container/myblob myaccount.file.core.windows.netmysharemyfile.txt http://myaccount.file.core.windows.net/myshare/myfile.txt Directories Flat namespace however prefix listing can simulate virtual directories True directory objects Case Sensitivity of Names Case sensitive Case insensitive, but case preserving Capacity Up to 500TB containers 5TB file shares Throughput Up to 60 MB/s per blob Up to 60 MB/s per share Object size Up to 1 TB/blob Up to 1 TB/file Billed capacity Based on bytes written Based on file size
  • 36.
  • 37.
  • 39.
    Distributed Storage (HDFS) Query (Hive) Distributed Processing (MapReduce) ODBC Legend ■Core Hadoop ■ Data processing ■ Data Movement ■ Packages Hadoop ecosystem
  • 40.
    HDInsight and Hadoop HadoopCore + Hive, Pig, HBase C#, F#, .NET Azure Storage (WASB) Office 365 Power BI (Excel, PowerQuery, PowerView, BI Sites) World's Data (Azure Data Marketplace) ODBC Sqoop for SQL Server PowerShell
  • 41.
  • 42.
    43 Exemple 1 Azure VirtualMachine Azure Blob Storage Azure SQL Database Power BI Azure Data Catalog Script Power Shell Azure Storage Explorer Dépôt manuel FTP API Azure Blob Storage
  • 43.
    44 Exemple 2 Azure VirtualMachine Azure SQL Database Power BI Azure Data Catalog Azure SQL Database Azure Data Factory Base de données source
  • 44.
    45 Exemple 3 Azure WebApp (FTP) Azure Data Factory Azure Virtual Machine Azure Table Storage Azure SQL Database Power BI Azure Data Catalog
  • 45.
  • 46.
  • 47.
  • 48.
    49 Axes d’améliorations • RemplacementSSIS par une solution PaaS • Ajout de Master Services peut-être dans Data Catalog • Véritable solution FTP • Connecteur SSIS standard pour Table Storage
  • 49.
  • 50.
  • 51.
    52 Vue d’ensemble Avec lesservices Azure SQL Database SQL Datawarehouse Datalake Storage SQL Server in an IaaS VM
  • 52.
  • 53.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 56 Merci à nos sponsors PLATINUM LOCAUX PARTENAIRES MEDIA
  • 54.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 57 Sponsors internationaux
  • 55.
    www.azug.fr © 2017 AZUGFR. All Rights Reserved. 58 Nous suivre Facebook facebook.com/groups/azugfr Twitter twitter.com/AZUGFR Meetup meetup.com/AZUG-FR/ Web www.azug.fr Twitter twitter.com/MugLyon Web https://muglyon.github.io Meetup meetup.com/MugLyon
  • 56.

Notes de l'éditeur

  • #32 Speaker notes: HDFS for the Cloud: The Azure Data Lake is a Hadoop File System compatible with HDFS enabling Microsoft offerings such as Azure HDInsight, Revolution-R Enterprise, industry Hadoop distributions like Hortonworks and Cloudera all to connect to it. Petabyte files, massive throughput: The goal of the data lake is to run Hadoop and advanced analytics on all your data to discover conclusions from the data itself.  Curated data: Azure Data Lake can also serve as a repository for lower cost data preparation prior to moving curated data into a data warehouse such as Azure Data Warehouse.
  • #35 Slide Objectives Understand the hierarchy of Blob storage Speaker Notes The Blob service provides storage for entities, such as binary files and text files. A storage account can be a combination of Tables, Blobs and Queues. A storage account will contain one or many containers. Each container can contain one or more blobs. The REST API for the Blob service exposes two resources: Containers Blobs. A container is a set of blobs; every blob must belong to a container. The Blob service defines two types of blobs: Block blobs, which are optimized for streaming. Page blobs, which are optimized for random read/write operations and which provide the ability to write to a range of bytes in a blob. Notes http://msdn.microsoft.com/en-us/library/dd573356.aspx Using the REST API for the Blob service, developers can create a hierarchical namespace similar to a file system. Blob names may encode a hierarchy by using a configurable path separator. For example, the blob names MyGroup/MyBlob1 and MyGroup/MyBlob2 imply a virtual level of organization for blobs. The enumeration operation for blobs supports traversing the virtual hierarchy in a manner similar to that of a file system, so that you can return a set of blobs that are organized beneath a group. For example, you can enumerate all blobs organized under MyGroup/.
  • #36 The Server Message Block (SMB) Protocol is a network file sharing protocol, and as implemented in Microsoft Windows is known as Microsoft SMB Protocol. The set of message packets that defines a particular version of the protocol is called a dialect. The Common Internet File System (CIFS) Protocol is a dialect of SMB.
  • #37 Emphasize the Capacity, Throughput and Object size fields
  • #40 ,
  • #41 MapReduce breaks down the data and sends them to different computers for processing. These computers together form a cluster. Hadoop incorporates this framework and calls these clusters as Hadoop clusters. Analogous to GROUP BY in SQL Hive is a SQL-Like query syntax Pig is a Script language for expressing MapReduce jobs