SlideShare une entreprise Scribd logo
1  sur  114
Building big data applications on
Azure
Pranav Rastogi/ Bharath Sreenivas
Microsoft
pranav.rastogi@microsoft.com
@rustd/ @bharathbs
Security and privacyFlexibility of choiceReason over any data, anywhere
Data warehouses
Data lakes
Operational databases
Hybrid
Data warehouses
Data lakes
Operational databases
SocialLOB Graph IoTImageCRM
Apps + insights
Social
LOB
Graph
IoT
Image
CRM INGEST STORE PREP & TRAIN MODEL &
SERVE
Data orchestration
and monitoring
Big data
store
Hadoop/Spark and
machine learning
Data warehouse
Different Big Data Solutions
Solution scenarios
Three scenarios that take optimal advantage of Big Data
Modern DW
“We want to incorporate all
of our data including ‘big
data” with our data
warehouse”
Advanced Analytics
“We are trying to predict
when our customers churn.”
Internet of Things (IoT)
“We are trying to get insights
from our devices in real-time,
etc.”
Governance and
Master Data Management
Azure SQL Data Warehouse
Data Quality and
Lineage
ERP, CRM,
and other
LOB Data
OLTP and
other
RDBMS
Clickstream
Logs and
Events
Sensors,
Social,
Weather, and
other un-
structured
data
ETL
Azure Data Lake
Analytics (U-SQL)
Azure Storage / Azure Data Lake
Azure HDInsight
(Hadoop / Spark)
Azure Analysis
Services
BI Models
Power BI
Reports and
Dashboards
Polybase
Analyst
Power User
Data Engineer
Data Scientist
Big Data Warehouse
OLTP and
other
RDBMS
Clickstream
Logs and
Events
Sensors,
Social,
Weather, and
other un-
structured
data
REPL and
Machine
Learning Tools
Data
Wrangling
Tools
Data Engineer Data Scientist
Deep Learning
& Cognitive
Services
Azure
Cosmos DB
Apps
Automated
Systems
People
Web
Mobile
Bots
ML Models
and Scoring
APIs
Advanced Analytics and AI
Azure Data Lake
Analytics (U-SQL)
Azure Storage / Azure Data Lake
Azure HDInsight
(Hadoop / Spark)
Azure Stream Analytics / Spark Streaming
Clean,
Curate,
Aggregate
Combine
reference
data
Perform
Scoring from
ML models
IoT Sensors
and/or
User
activity
streams
Social,
Trends,
Weather
etc.
Clickstream,
Batch Files,
server logs,
Images,
videos, and
other
unstructured
data
Azure Event Hubs,
Apache Kafka
Event
Broker/Buffer
Queue
Event
Broker
Power BI
Realtime
Dashboards
Analyst
Data Engineer
Data Scientist
Azure ML / R
Trained Machine
Learning Models
Azure SQL DB /
Cosmos DB
Reference Data
Automated
Systems
Realtime Processing with Lambda Architecture
Azure Data Lake
Analytics (U-SQL)
Azure Storage / Azure Data Lake
Azure HDInsight
(Hadoop / Spark)
A d v a n c e d a n a l y t i c s a n d b i g d a t a
i m p a c t s a l l v e r t i c a l s
Heartland Bank prevents fraud
and boosts profits
The UK NHS transforms healthcare
with faster access to information.
City of Barcelona boosts citizen
unsegmented with intelligent app
Jet.com transforms customer engagement
with truly aerosolized experience
Rolls Royce decreases costs with
Predictive Maintenance
Manufacturing
Eliminate downtime and
increase efficiency by enabling
better predictive maintenance
for your capital assets.
Banking
Minimize losses with more
accurate fraud detection and
assess exposure to asset,
credit and market risk using a
holistic approach
Boost operational efficiency
and improve patient acre
experience with intelligent
detection and in time service.
Healthcare Government
Empower citizens and
improve their engagement
with relevant information and
personalized citizen services.
Retail
Turn individual customer
interactions into contextual
engagements and increase
customer satisfaction with highly
personalized offers and content
Managed Open Source Analytics for the
cloud with a 99.9% SLA.
100% Open Source
Clusters up and running in minutes
63% lower TCO than deploy your own Hadoop on-
premises
Separation of compute and store allows you to scale
clusters to exponentially reduce costs
Open Source Analytics for the Enterprise
Big data is hard
Buy
Servers
Install
OSS
Secure Configure
Optimize
Debug
Success
Scale up
HDInsight makes it easy
Provide
Cluster
details
HDInsight
Cluster
 100% open source
 Optimized
 Highly available
 Secure
 Scalable
 Dedicated
 Managed
 Certified ISVs
 Customizable
Browse to
Azure Portal
Multi Region Availability
Available in >25 regions world-wide
Launched most recently in US West 2, and UK regions
Available in China, Europe and US Government clouds
Deploy Globally Within Minutes
Perimeter Level Security
Virtual Networks
Network Security Groups (firewalls)
Authentication
Azure Active Directory
Kerberos authentication
Authorization
Apache Ranger
RBAC for Admin
POSIX ACLs for Data Plane Data Security
Server-Side encryption at rest
HTTPS/TLS In-transit
Security and Compliance to Enable OSS for Enterprises
Plugins for HDI available for most popular IDEs for agile
development and debugging
Rich support for powerful notebooks used by data
scientists
Develop in C#, deploy on Linux in Java via HDI
developed SCP.Net technology
Remote Debugging for Spark jobs
Rich Developer Ecosystem
Recognized by
Top Analysts
Forrester Wave for Big Data
Hadoop Cloud
• Named industry leader by
Forrester with the most
comprehensive, scalable, and
integrated platforms*
• Recognized for its cloud-first
strategy that is paying off*
*The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.
Products and Services Organization Size Industry Country Business Need
Simplified pricing process
now takes minutes instead
of days
Competitive pricing, product demand, the costs of materials, gas and
labor, and the thousands of other market variables affect product cost
and customer demand for products or services around the world. It’s
why accurate and profitable pricing represents one of the most
difficult business challenges for many companies. Manufacturing,
distribution, services, and airline companies look to the science and
technology provided by PROS to keep their pricing accurate,
competitive, and profitable. The PROS Guidance product runs
enormously complex pricing calculations based on variables that
comprise multiple terabytes of data. To handle this calculation
complexity and data volume, and then deliver specific results to its
clients quickly, PROS built its services on top of Azure HDInsight.
Pricing Software-
as-a-Service
United StatesOther-
unsegmented
1,000Microsoft Azure
Azure HDInsight
Apache Spark for Azure
HDInsight
HDInsight architecture
Hive meta store
Azure SQL database
Azure Storage or
Data Lake Store
Client
machines
HDInsight cluster
Gateway
nodes
Head
nodes
Worker
nodes
Edge
nodes
Zookeeper nodes
Scale compute & storage independently
Gateway
nodes
Head
nodes
Worker
nodes
Edge
nodes
Zookeeper nodes
Azure Blob Storage
or
Azure Data Lake
Store
Persist & Reuse your data
 Your data is outside the
HDInsight cluster.
 Hence data is persisted
even if you drop and
recreate the cluster.
 Create multiple clusters
and point to same storage.
Azure Blob Storage
or
Azure Data Lake
Store
HDInsight
cluster
HDInsight
cluster
HDInsight
cluster
HDInsight
cluster
Create cluster using Azure CLI
https://docs.microsoft.com/en-
us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-
azure-cli
azure hdinsight cluster create -g groupname -l location WestUS-y Linux --clusterType Hadoop --
defaultStorageAccountName storagename.blob.core.windows.net --defaultStorageAccountKey storagekey
--defaultStorageContainer clustername --workerNodeCount 3 --userName admin --password
httppassword --sshUserName sshuser --sshPassword sshuserpassword clustername
Azure
Blob
Storage
HDInsight Spark cluster
Azure SQL
Data Warehouse
Azure SQL
Database
Azure Data Lake
Store
Azure Cosmos
DB
Azure SQL
Database
Azure
Blob
Storage
Azure SQL
Data Warehouse
Azure Data Lake
Store
Azure Cosmos
DB
jobs
HDInsight Spark cluster
Storage
Files/Folders Azure
Blob
Storage
Azure SQL
Data Warehouse
Azure SQL
Database
Azure Data Lake
Store
Azure Cosmos
DB
jobs
Storage Storage
HDInsight Spark cluster1. Create cluster
2. Submit jobs
6. Drop cluster jobs
1. Data Lake with no limits
HDInsight Spark cluster
streaming jobs
Web app
Mobile
Azure
Blob
Storage
Kafka
Event Hub
Azure Data Lake
Store
Azure Cosmos
DB
Azure SQL
Database
HBase
push pull
Azure Redis Cache
Bot
Apache
Flume
Kafka
Event Hub
Storage
Azure SQL
Data Warehouse
Azure SQL
Database
PrestoHDInsight
(Spark SQL)
HDInsight
(Interactive Hive)
Hive PartitionsFiles/Folders
HDInsight
(Spark streaming)
HDInsight
(Spark batch)
HDInsight
(AtScale)
Data Sources
Reads from
HDFS
Writes to
HDFS
Reads from
HDFS
Writes to
HDFSStep 1
“mapper”
Step 2
“reducer”
Step 1
Reads and writes
from HDFS
Read 1MB
sequentially from
disk
20,000,000 ns
Read 1 MB
sequentially from
SSD
1,000,000 ns
Read 1 MB
sequentially from
memory
250,000 ns
RDD
RDD
RDD
RDDRDD
Transformations ValueActions
Spark 1.x
Spark 2.x
val file = spark.textFile(“wasb://...")
val errors = file.filter(line => line.contains("ERROR"))
// Cache errors
errors.cache()
// Count all the errors
errors.count()
// Count errors mentioning MySQL
errors.filter(line => line.contains(“Web")).count()
// Fetch the MySQL errors as an array of strings
errors.filter(line => line.contains(“Error")).collect()
SQL
DataFrame
Unresolved
Logical Plan
Logical Plan
Optimized
Logical Plan
RDDs
Selected
Physical Plan
Analysis
Logical
Optimization
Physical
Planning
CostModel
Physical
Plans
Code
Generation
CatalogDataSet
123 “apache” “spark”
Azure
Blob
Storage
HDInsight Spark cluster
Azure SQL
Data Warehouse
Azure SQL
Database
Azure Data Lake
Store
Azure Cosmos
DB
Azure SQL
Database
Azure
Blob
Storage
Azure SQL
Data Warehouse
Azure Data Lake
Store
Azure Cosmos
DB
jobs
HDInsight R Server cluster Web app
Mobile
request/response
Bot
HDInsight Spark cluster
streaming jobs
Web app
Mobile
Azure
Blob
Storage
Azure Data Lake
Store
Azure Cosmos
DB
Azure SQL
Database
HBase
push pull
Azure Redis Cache
Bot
Power BI
real-time
dashboard
Kafka
Event Hub
Peace of mind Speed and
scalability
Flexibility
100% compatible with open source R
Wide range of scalable and distributed R functions
Ability to parallelize R functions
"http://www.ats.ucla.edu/stat/data/binary.csv"
“/data/binary.csv”
Cluster Name
pranavstratalab# 1-30
pranavstratalab# 30-45
pranavstratalab## 45-70
Cluster URL https://pranavstratalab##.azurehdinsight.net
Notebooks URL
https://pranavstratalab##.azurehdinsight.net/jupyter/tre
e
Cluster login user admin
Cluster password Abc!1234567890
and many more…
Phone Tracking Across Cell Sites
Connected Car - Remote
Management & Diagnostics
Asset Tracking
Fleet Management
Facilities Management
Personnel Tracking & Crowd
Control
Ride Sharing
Geofencing
Racecar Telemetry
Connected Manufacturing
and many more…
Data Sources Ingest Prepare
(normalize, clean, etc.)
Analyze
(stat analysis, ML, etc.)
Publish
(for programmatic
consumption, BI/visualization)
Consume
(Alerts, Operational Stats,
Insights)
Big Data Architecture
Data Consumption
(Ingestion)
Data Processing
Presentation/Serving
Layer
Data Sources Ingest Prepare
(normalize, clean, etc.)
Analyze
(stat analysis, ML, etc.)
Publish
(for programmatic
consumption, BI/visualization)
Consume
(Alerts, Operational Stats,
Insights)
Big Data Architecture
Data Processing
REALTIME ANALYTICS
INTERACTIVE ANALYTICS
BATCH ANALYTICS
Machine Learning
(Spark + Azure ML)
(Failure and RCA
Predictions)
HDI + ISVs
OLAP for Data
Warehousing
HDI Custom ETL
Aggregate /Partition
PowerBI
dashboard
(Shared with field
Ops, customers,
MIS, and Engineers)
Realtime Machine Learning
(Anomaly Detection)
CosmosDB
Interactive HDInsight clusters
BIG DATA STORAGE ANALYTICS
Big Data Storage
Azure Data
Lake Store
CosmosDB Azure Blob
Storage
Data Scientists,
BI Analysts
Big Data Applications
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-kafka-high-
availability
Costin$
Throughput MBps
Kafka Cost Estimator
Non Managed Disks Managed Disks
#KAFKANODES
THROUGHPUT MBPS
Kafka scale forecast
Kafka nodes (OS VHDs) Kafka nodes (managed disks)
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-kafka-mirroring
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-kafka-connect-vpn-gateway
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-kafka-connect-vpn-gateway
Azure VNet Boundary
Microsoft Databus
(Siphon) Usage 8 million
EVENTS PER SECOND PEAK INGRESS
800 TB (10 GB per Sec)
INGRESS PER DAY
1,800; 450
PRODUCTION KAFKA BROKERS; TOPICS
15 Sec
99th PERCENTILE LATENCY
KEY CUSTOMER SCENARIOS
Ads Monetization (Fast BI)
O365 Customer Fabric NRT – Tenant & User insights
BingNRT Operational Intelligence
Presto (Fast SML) interactive analysis
Delve Analytics
0
5
10
15
20
25
30
35
40
45
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16
Nov-16
Dec-16
Throughput(inGBps)
Siphon Data Volume (Ingress and Egress)
Volume published (GBps) Volume subscribed (GBps)
0
5
10
15
20
25
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16
Nov-16
Dec-16
Throughput(eventspersec)Millions
Siphon Events per second (Ingress and Egress)
EPS In Eps Out
Asia DC
Zookeeper Canary
Kafka
Collector
Agent
Services Data Pull (Agent)
Services Data Push
Device Proxy Services
Consumer
API (Push/
Pull)
Europe DC
Zookeeper Canary
Kafka
US DC
Zookeeper Canary
Kafka
Streaming
Batch
Audit Trail
Open Source
Microsoft Internal
Siphon
Tool Purpose
Ambari Dashboard for monitoring health and status of the
Hadoop cluster
Yarn UI Monitor Yarn Application and logs
Tez View Track and debug the execution of jobs
Grafana Workload specific JMX metrics
Spark History Server The history server displays both completed and
incomplete Spark jobs
HMaster UI HBase provides a web-based user interface that you
can use to monitor your HBase cluster
Visual Studio /VS Code Monitor a Job status in VS with DataLake tools. Spark
Remote Job debugging
OMS Agent for
Linux
HDInsight nodes (Head, Worker ,
Zookeeper )
FluentD
HDInsight
plugin
1. Plugin for ‘in_tail’ for all Logs, allows
regexp to create JSON object
2. Filter for WARN and above for each
Log Type. `grep` filter plugin
3. Output to out_oms_api Type
4. Exec plugin for Metrics
HBaseConfigomsconfig
Spark
Hive
Storm
Kafka
Config
Config
Config
Config
Log Analytics(OMS) Service
Gateway
nodes
Head
nodes
Worker
nodes
Edge
nodes
Zookeeper nodes
HDInsight security – rings of defense
Perimeter level security
Virtual network
Network security (i.e. firewalls)
Gateway
Service Tunneling
Authentication
Kerberos
Active directory
Authorization
Hive policies
HBase policies
File and folder level ACLS
Data security
Encryption @ rest
Perimeter level security
Using virtual network and gateway service
Perimeter level security
Virtual network
Network security (i.e. firewalls)
Gateway
Service Tunneling
Perimeter level security – Virtual Network and Gateway
HDInsight cluster
Head node
Perimeter level security – Network Security Group
HDInsight cluster
Head node
Contoso
Server,
Microsoft
IP
Storage,
SQL
Authentication
Integration with Azure Active Directory
Authentication
Kerberos
Active directory
Authorization
Application and data-level authorization
Authorization
Hive policies
HBase policies
File and folder level ACLS
HDInsight cluster
Head node
Domain credentials
Kerberos ticket
OAuth ticket
Kerberos AuthN
LDAP
Authorization: Workload and Storage (WASB/ADLS)
Active Directory Domain
Services Azure VNET to
VNET peering
SAS Keys
Apache Ranger
Data security
Transparent Server Side Encryption
Data security
Encryption @ rest & in transit
Transparent Server Side Encryption
Azure Data Lake Storage
ALWAYS ON transparent encryption
All reads/writes are encrypted/decrypted
Service managed keys as well as Customer
managed keys
Encryption @ Rest and Encryption in Transit
Microsoft Azure Storage Blob
ALWAYS ON transparent encryption
All reads/writes are encrypted/decrypted
Service managed keys as well as Customer managed keys
Encryption @ Rest and Encryption in Transit
https://azure.microsoft.com/en-
us/services/hdinsight/
https://docs.microsoft.com/en-us/azure/hdinsight/
https://aka.ms/hdinsighttraining
THANK YOU
Pranav Rastogi/ Bharath Sreenivas
Microsoft
@rustd/ @bharathbs

Contenu connexe

Tendances

Designing big data analytics solutions on azure
Designing big data analytics solutions on azureDesigning big data analytics solutions on azure
Designing big data analytics solutions on azureMohamed Tawfik
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...Lace Lofranco
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS CloudIdan Tohami
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksAlberto Diaz Martin
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaDatabricks
 

Tendances (20)

Designing big data analytics solutions on azure
Designing big data analytics solutions on azureDesigning big data analytics solutions on azure
Designing big data analytics solutions on azure
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
 

Similaire à Big Data on Azure Tutorial

Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
#DataOnCloud New York Event
#DataOnCloud New York Event#DataOnCloud New York Event
#DataOnCloud New York EventHARMAN Services
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Abhimanyu Singhal
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
4. aws enterprise summit seoul   기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park4. aws enterprise summit seoul   기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas parkAmazon Web Services Korea
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
Azure Overview Arc
Azure Overview ArcAzure Overview Arc
Azure Overview Arcrajramab
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingAmazon Web Services
 
Big Data Companies and Apache Software
Big Data Companies and Apache SoftwareBig Data Companies and Apache Software
Big Data Companies and Apache SoftwareBob Marcus
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Cscorajramab
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloudJames Serra
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWSAmazon Web Services
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
Cloud computing adoption in sap technologies
Cloud computing adoption in sap technologiesCloud computing adoption in sap technologies
Cloud computing adoption in sap technologiessveldanda
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Certus Solutions
 
Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13Amazon Web Services
 

Similaire à Big Data on Azure Tutorial (20)

Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
#DataOnCloud New York Event
#DataOnCloud New York Event#DataOnCloud New York Event
#DataOnCloud New York Event
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
4. aws enterprise summit seoul   기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park4. aws enterprise summit seoul   기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Azure Overview Arc
Azure Overview ArcAzure Overview Arc
Azure Overview Arc
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Big Data Companies and Apache Software
Big Data Companies and Apache SoftwareBig Data Companies and Apache Software
Big Data Companies and Apache Software
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Csco
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWS
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Cloud computing adoption in sap technologies
Cloud computing adoption in sap technologiesCloud computing adoption in sap technologies
Cloud computing adoption in sap technologies
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
 
Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13
 

Plus de rustd

Asp.net identity dot netconf
Asp.net identity dot netconfAsp.net identity dot netconf
Asp.net identity dot netconfrustd
 
Microsoft asp.net identity security
Microsoft asp.net identity  securityMicrosoft asp.net identity  security
Microsoft asp.net identity securityrustd
 
Microsoft signal r
Microsoft signal rMicrosoft signal r
Microsoft signal rrustd
 
Web forms- DevIntersection
Web forms- DevIntersectionWeb forms- DevIntersection
Web forms- DevIntersectionrustd
 
Web forms
Web formsWeb forms
Web formsrustd
 
Webforms
WebformsWebforms
Webformsrustd
 
Webforms_TechEd
Webforms_TechEdWebforms_TechEd
Webforms_TechEdrustd
 
Webstandards_TechEdIndia
Webstandards_TechEdIndiaWebstandards_TechEdIndia
Webstandards_TechEdIndiarustd
 

Plus de rustd (8)

Asp.net identity dot netconf
Asp.net identity dot netconfAsp.net identity dot netconf
Asp.net identity dot netconf
 
Microsoft asp.net identity security
Microsoft asp.net identity  securityMicrosoft asp.net identity  security
Microsoft asp.net identity security
 
Microsoft signal r
Microsoft signal rMicrosoft signal r
Microsoft signal r
 
Web forms- DevIntersection
Web forms- DevIntersectionWeb forms- DevIntersection
Web forms- DevIntersection
 
Web forms
Web formsWeb forms
Web forms
 
Webforms
WebformsWebforms
Webforms
 
Webforms_TechEd
Webforms_TechEdWebforms_TechEd
Webforms_TechEd
 
Webstandards_TechEdIndia
Webstandards_TechEdIndiaWebstandards_TechEdIndia
Webstandards_TechEdIndia
 

Dernier

7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 

Dernier (20)

7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

Big Data on Azure Tutorial

  • 1. Building big data applications on Azure Pranav Rastogi/ Bharath Sreenivas Microsoft pranav.rastogi@microsoft.com @rustd/ @bharathbs
  • 2.
  • 3. Security and privacyFlexibility of choiceReason over any data, anywhere Data warehouses Data lakes Operational databases Hybrid Data warehouses Data lakes Operational databases SocialLOB Graph IoTImageCRM
  • 4. Apps + insights Social LOB Graph IoT Image CRM INGEST STORE PREP & TRAIN MODEL & SERVE Data orchestration and monitoring Big data store Hadoop/Spark and machine learning Data warehouse
  • 5. Different Big Data Solutions
  • 6. Solution scenarios Three scenarios that take optimal advantage of Big Data Modern DW “We want to incorporate all of our data including ‘big data” with our data warehouse” Advanced Analytics “We are trying to predict when our customers churn.” Internet of Things (IoT) “We are trying to get insights from our devices in real-time, etc.”
  • 7. Governance and Master Data Management Azure SQL Data Warehouse Data Quality and Lineage ERP, CRM, and other LOB Data OLTP and other RDBMS Clickstream Logs and Events Sensors, Social, Weather, and other un- structured data ETL Azure Data Lake Analytics (U-SQL) Azure Storage / Azure Data Lake Azure HDInsight (Hadoop / Spark) Azure Analysis Services BI Models Power BI Reports and Dashboards Polybase Analyst Power User Data Engineer Data Scientist Big Data Warehouse
  • 8. OLTP and other RDBMS Clickstream Logs and Events Sensors, Social, Weather, and other un- structured data REPL and Machine Learning Tools Data Wrangling Tools Data Engineer Data Scientist Deep Learning & Cognitive Services Azure Cosmos DB Apps Automated Systems People Web Mobile Bots ML Models and Scoring APIs Advanced Analytics and AI Azure Data Lake Analytics (U-SQL) Azure Storage / Azure Data Lake Azure HDInsight (Hadoop / Spark)
  • 9. Azure Stream Analytics / Spark Streaming Clean, Curate, Aggregate Combine reference data Perform Scoring from ML models IoT Sensors and/or User activity streams Social, Trends, Weather etc. Clickstream, Batch Files, server logs, Images, videos, and other unstructured data Azure Event Hubs, Apache Kafka Event Broker/Buffer Queue Event Broker Power BI Realtime Dashboards Analyst Data Engineer Data Scientist Azure ML / R Trained Machine Learning Models Azure SQL DB / Cosmos DB Reference Data Automated Systems Realtime Processing with Lambda Architecture Azure Data Lake Analytics (U-SQL) Azure Storage / Azure Data Lake Azure HDInsight (Hadoop / Spark)
  • 10. A d v a n c e d a n a l y t i c s a n d b i g d a t a i m p a c t s a l l v e r t i c a l s Heartland Bank prevents fraud and boosts profits The UK NHS transforms healthcare with faster access to information. City of Barcelona boosts citizen unsegmented with intelligent app Jet.com transforms customer engagement with truly aerosolized experience Rolls Royce decreases costs with Predictive Maintenance Manufacturing Eliminate downtime and increase efficiency by enabling better predictive maintenance for your capital assets. Banking Minimize losses with more accurate fraud detection and assess exposure to asset, credit and market risk using a holistic approach Boost operational efficiency and improve patient acre experience with intelligent detection and in time service. Healthcare Government Empower citizens and improve their engagement with relevant information and personalized citizen services. Retail Turn individual customer interactions into contextual engagements and increase customer satisfaction with highly personalized offers and content
  • 11.
  • 12. Managed Open Source Analytics for the cloud with a 99.9% SLA. 100% Open Source Clusters up and running in minutes 63% lower TCO than deploy your own Hadoop on- premises Separation of compute and store allows you to scale clusters to exponentially reduce costs Open Source Analytics for the Enterprise
  • 13. Big data is hard Buy Servers Install OSS Secure Configure Optimize Debug Success Scale up
  • 14. HDInsight makes it easy Provide Cluster details HDInsight Cluster  100% open source  Optimized  Highly available  Secure  Scalable  Dedicated  Managed  Certified ISVs  Customizable Browse to Azure Portal
  • 15. Multi Region Availability Available in >25 regions world-wide Launched most recently in US West 2, and UK regions Available in China, Europe and US Government clouds Deploy Globally Within Minutes
  • 16. Perimeter Level Security Virtual Networks Network Security Groups (firewalls) Authentication Azure Active Directory Kerberos authentication Authorization Apache Ranger RBAC for Admin POSIX ACLs for Data Plane Data Security Server-Side encryption at rest HTTPS/TLS In-transit Security and Compliance to Enable OSS for Enterprises
  • 17. Plugins for HDI available for most popular IDEs for agile development and debugging Rich support for powerful notebooks used by data scientists Develop in C#, deploy on Linux in Java via HDI developed SCP.Net technology Remote Debugging for Spark jobs Rich Developer Ecosystem
  • 18. Recognized by Top Analysts Forrester Wave for Big Data Hadoop Cloud • Named industry leader by Forrester with the most comprehensive, scalable, and integrated platforms* • Recognized for its cloud-first strategy that is paying off* *The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.
  • 19. Products and Services Organization Size Industry Country Business Need Simplified pricing process now takes minutes instead of days Competitive pricing, product demand, the costs of materials, gas and labor, and the thousands of other market variables affect product cost and customer demand for products or services around the world. It’s why accurate and profitable pricing represents one of the most difficult business challenges for many companies. Manufacturing, distribution, services, and airline companies look to the science and technology provided by PROS to keep their pricing accurate, competitive, and profitable. The PROS Guidance product runs enormously complex pricing calculations based on variables that comprise multiple terabytes of data. To handle this calculation complexity and data volume, and then deliver specific results to its clients quickly, PROS built its services on top of Azure HDInsight. Pricing Software- as-a-Service United StatesOther- unsegmented 1,000Microsoft Azure Azure HDInsight Apache Spark for Azure HDInsight
  • 20. HDInsight architecture Hive meta store Azure SQL database Azure Storage or Data Lake Store Client machines HDInsight cluster Gateway nodes Head nodes Worker nodes Edge nodes Zookeeper nodes
  • 21. Scale compute & storage independently Gateway nodes Head nodes Worker nodes Edge nodes Zookeeper nodes Azure Blob Storage or Azure Data Lake Store
  • 22. Persist & Reuse your data  Your data is outside the HDInsight cluster.  Hence data is persisted even if you drop and recreate the cluster.  Create multiple clusters and point to same storage. Azure Blob Storage or Azure Data Lake Store HDInsight cluster HDInsight cluster HDInsight cluster HDInsight cluster
  • 23.
  • 24. Create cluster using Azure CLI https://docs.microsoft.com/en- us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters- azure-cli azure hdinsight cluster create -g groupname -l location WestUS-y Linux --clusterType Hadoop -- defaultStorageAccountName storagename.blob.core.windows.net --defaultStorageAccountKey storagekey --defaultStorageContainer clustername --workerNodeCount 3 --userName admin --password httppassword --sshUserName sshuser --sshPassword sshuserpassword clustername
  • 25.
  • 26.
  • 27. Azure Blob Storage HDInsight Spark cluster Azure SQL Data Warehouse Azure SQL Database Azure Data Lake Store Azure Cosmos DB Azure SQL Database Azure Blob Storage Azure SQL Data Warehouse Azure Data Lake Store Azure Cosmos DB jobs
  • 28. HDInsight Spark cluster Storage Files/Folders Azure Blob Storage Azure SQL Data Warehouse Azure SQL Database Azure Data Lake Store Azure Cosmos DB jobs
  • 29. Storage Storage HDInsight Spark cluster1. Create cluster 2. Submit jobs 6. Drop cluster jobs
  • 30.
  • 31. 1. Data Lake with no limits
  • 32.
  • 33.
  • 34.
  • 35. HDInsight Spark cluster streaming jobs Web app Mobile Azure Blob Storage Kafka Event Hub Azure Data Lake Store Azure Cosmos DB Azure SQL Database HBase push pull Azure Redis Cache Bot
  • 36.
  • 37. Apache Flume Kafka Event Hub Storage Azure SQL Data Warehouse Azure SQL Database PrestoHDInsight (Spark SQL) HDInsight (Interactive Hive) Hive PartitionsFiles/Folders HDInsight (Spark streaming) HDInsight (Spark batch) HDInsight (AtScale)
  • 38.
  • 40.
  • 41. Reads from HDFS Writes to HDFS Reads from HDFS Writes to HDFSStep 1 “mapper” Step 2 “reducer” Step 1 Reads and writes from HDFS Read 1MB sequentially from disk 20,000,000 ns Read 1 MB sequentially from SSD 1,000,000 ns Read 1 MB sequentially from memory 250,000 ns
  • 44. val file = spark.textFile(“wasb://...") val errors = file.filter(line => line.contains("ERROR")) // Cache errors errors.cache() // Count all the errors errors.count() // Count errors mentioning MySQL errors.filter(line => line.contains(“Web")).count() // Fetch the MySQL errors as an array of strings errors.filter(line => line.contains(“Error")).collect()
  • 45.
  • 46. SQL DataFrame Unresolved Logical Plan Logical Plan Optimized Logical Plan RDDs Selected Physical Plan Analysis Logical Optimization Physical Planning CostModel Physical Plans Code Generation CatalogDataSet
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. Azure Blob Storage HDInsight Spark cluster Azure SQL Data Warehouse Azure SQL Database Azure Data Lake Store Azure Cosmos DB Azure SQL Database Azure Blob Storage Azure SQL Data Warehouse Azure Data Lake Store Azure Cosmos DB jobs
  • 56. HDInsight R Server cluster Web app Mobile request/response Bot
  • 57. HDInsight Spark cluster streaming jobs Web app Mobile Azure Blob Storage Azure Data Lake Store Azure Cosmos DB Azure SQL Database HBase push pull Azure Redis Cache Bot Power BI real-time dashboard Kafka Event Hub
  • 58.
  • 59. Peace of mind Speed and scalability Flexibility
  • 60. 100% compatible with open source R Wide range of scalable and distributed R functions Ability to parallelize R functions
  • 63.
  • 64. Cluster Name pranavstratalab# 1-30 pranavstratalab# 30-45 pranavstratalab## 45-70 Cluster URL https://pranavstratalab##.azurehdinsight.net Notebooks URL https://pranavstratalab##.azurehdinsight.net/jupyter/tre e Cluster login user admin Cluster password Abc!1234567890
  • 65.
  • 67.
  • 68. Phone Tracking Across Cell Sites Connected Car - Remote Management & Diagnostics Asset Tracking Fleet Management Facilities Management Personnel Tracking & Crowd Control Ride Sharing Geofencing Racecar Telemetry Connected Manufacturing and many more…
  • 69. Data Sources Ingest Prepare (normalize, clean, etc.) Analyze (stat analysis, ML, etc.) Publish (for programmatic consumption, BI/visualization) Consume (Alerts, Operational Stats, Insights) Big Data Architecture Data Consumption (Ingestion) Data Processing Presentation/Serving Layer
  • 70. Data Sources Ingest Prepare (normalize, clean, etc.) Analyze (stat analysis, ML, etc.) Publish (for programmatic consumption, BI/visualization) Consume (Alerts, Operational Stats, Insights) Big Data Architecture Data Processing REALTIME ANALYTICS INTERACTIVE ANALYTICS BATCH ANALYTICS Machine Learning (Spark + Azure ML) (Failure and RCA Predictions) HDI + ISVs OLAP for Data Warehousing HDI Custom ETL Aggregate /Partition PowerBI dashboard (Shared with field Ops, customers, MIS, and Engineers) Realtime Machine Learning (Anomaly Detection) CosmosDB Interactive HDInsight clusters BIG DATA STORAGE ANALYTICS Big Data Storage Azure Data Lake Store CosmosDB Azure Blob Storage Data Scientists, BI Analysts Big Data Applications
  • 71.
  • 73.
  • 74.
  • 75. Costin$ Throughput MBps Kafka Cost Estimator Non Managed Disks Managed Disks #KAFKANODES THROUGHPUT MBPS Kafka scale forecast Kafka nodes (OS VHDs) Kafka nodes (managed disks)
  • 76.
  • 80. Microsoft Databus (Siphon) Usage 8 million EVENTS PER SECOND PEAK INGRESS 800 TB (10 GB per Sec) INGRESS PER DAY 1,800; 450 PRODUCTION KAFKA BROKERS; TOPICS 15 Sec 99th PERCENTILE LATENCY KEY CUSTOMER SCENARIOS Ads Monetization (Fast BI) O365 Customer Fabric NRT – Tenant & User insights BingNRT Operational Intelligence Presto (Fast SML) interactive analysis Delve Analytics 0 5 10 15 20 25 30 35 40 45 Jan-15 Feb-15 Mar-15 Apr-15 May-15 Jun-15 Jul-15 Aug-15 Sep-15 Oct-15 Nov-15 Dec-15 Jan-16 Feb-16 Mar-16 Apr-16 May-16 Jun-16 Jul-16 Aug-16 Sep-16 Oct-16 Nov-16 Dec-16 Throughput(inGBps) Siphon Data Volume (Ingress and Egress) Volume published (GBps) Volume subscribed (GBps) 0 5 10 15 20 25 Jan-15 Feb-15 Mar-15 Apr-15 May-15 Jun-15 Jul-15 Aug-15 Sep-15 Oct-15 Nov-15 Dec-15 Jan-16 Feb-16 Mar-16 Apr-16 May-16 Jun-16 Jul-16 Aug-16 Sep-16 Oct-16 Nov-16 Dec-16 Throughput(eventspersec)Millions Siphon Events per second (Ingress and Egress) EPS In Eps Out
  • 81. Asia DC Zookeeper Canary Kafka Collector Agent Services Data Pull (Agent) Services Data Push Device Proxy Services Consumer API (Push/ Pull) Europe DC Zookeeper Canary Kafka US DC Zookeeper Canary Kafka Streaming Batch Audit Trail Open Source Microsoft Internal Siphon
  • 82.
  • 83.
  • 84.
  • 85.
  • 86. Tool Purpose Ambari Dashboard for monitoring health and status of the Hadoop cluster Yarn UI Monitor Yarn Application and logs Tez View Track and debug the execution of jobs Grafana Workload specific JMX metrics Spark History Server The history server displays both completed and incomplete Spark jobs HMaster UI HBase provides a web-based user interface that you can use to monitor your HBase cluster Visual Studio /VS Code Monitor a Job status in VS with DataLake tools. Spark Remote Job debugging
  • 87.
  • 88.
  • 89. OMS Agent for Linux HDInsight nodes (Head, Worker , Zookeeper ) FluentD HDInsight plugin 1. Plugin for ‘in_tail’ for all Logs, allows regexp to create JSON object 2. Filter for WARN and above for each Log Type. `grep` filter plugin 3. Output to out_oms_api Type 4. Exec plugin for Metrics HBaseConfigomsconfig Spark Hive Storm Kafka Config Config Config Config Log Analytics(OMS) Service
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101. HDInsight security – rings of defense Perimeter level security Virtual network Network security (i.e. firewalls) Gateway Service Tunneling Authentication Kerberos Active directory Authorization Hive policies HBase policies File and folder level ACLS Data security Encryption @ rest
  • 102. Perimeter level security Using virtual network and gateway service Perimeter level security Virtual network Network security (i.e. firewalls) Gateway Service Tunneling
  • 103. Perimeter level security – Virtual Network and Gateway HDInsight cluster Head node
  • 104. Perimeter level security – Network Security Group HDInsight cluster Head node Contoso Server, Microsoft IP Storage, SQL
  • 105. Authentication Integration with Azure Active Directory Authentication Kerberos Active directory
  • 106. Authorization Application and data-level authorization Authorization Hive policies HBase policies File and folder level ACLS
  • 107. HDInsight cluster Head node Domain credentials Kerberos ticket OAuth ticket Kerberos AuthN LDAP Authorization: Workload and Storage (WASB/ADLS) Active Directory Domain Services Azure VNET to VNET peering SAS Keys
  • 109.
  • 110.
  • 111. Data security Transparent Server Side Encryption Data security Encryption @ rest & in transit
  • 112. Transparent Server Side Encryption Azure Data Lake Storage ALWAYS ON transparent encryption All reads/writes are encrypted/decrypted Service managed keys as well as Customer managed keys Encryption @ Rest and Encryption in Transit Microsoft Azure Storage Blob ALWAYS ON transparent encryption All reads/writes are encrypted/decrypted Service managed keys as well as Customer managed keys Encryption @ Rest and Encryption in Transit
  • 114. THANK YOU Pranav Rastogi/ Bharath Sreenivas Microsoft @rustd/ @bharathbs

Notes de l'éditeur

  1. All kinds of data being generated   Stored on-premises and in the cloud – but vast majority in hybrid   Reason over all this data without requiring to move data   They want a choice of platform and languages, privacy and security   <Transition> Microsoft’s offerng
  2. https://aka.ms/sg4qzc https://customers.microsoft.com/doclink/pros
  3. Objective: This slide describes the architecture of how Apache Spark is different, allowing it to offer better performance for data sharing. Table Source: https://gist.github.com/jboner/2841832 Talking points: Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly, much more quickly than disk-based systems. Spark integrates into the Scala programming language to let you manipulate distributed data sets like local collections. No need to structure everything as map and reduce operations. Data sharing between operations is faster, since data is in-memory. Hadoop shares data through HDFS, an expensive option. It also maintains three replicas. Spark stores data in-memory without any replication.
  4. Objective: This slide explains the two types of operations that RDDs support: transformation and actions. Talking points: Transformations create a new data set from an existing data set. Transformations do not compute their results right away. They are only computed when an action requires a result to be returned to the driver program. Does not apply to persistent RDDs. Examples include: map, filter, sample, union, and more. Actions return a value to the driver program after running a computation on the data set. Examples include: reduce, collect, count, first, foreach, and more.
  5. Objective: This slide shows an example of how transformations and actions are enabled to search through error messages. Talking points: Cache errors – Implementing this action will collect all the errors present Count all errors – Implementing this action counts all the errors in the data Count errors mentioning MySQL – When implementing this code, MySQL errors are counted Fetch the MySQL errors as an array of strings – When implementing this code, MySQL errors are extracted as an array of strings
  6. Event Detection in Realtime FINANCIAL ENGINES CONNECTED CAR – SENSORS FIRE Data Landing for Learning Use cases Connected Car Insurance companies for Connected Driving
  7. What are the three Big components that You need to stand up when you ASK: Who knows what Lambda architecture is Who has helped implement one? Walk through VERTICALS Ingest Prep + Analyze Serve Consume Horizontals Drive by speed – realtime vs Batch
  8. What are the three Big components that You need to stand up when you ASK: Who knows what Lambda architecture is Who has helped implement one? Walk through VERTICALS Ingest Prep + Analyze Serve Consume Horizontals Drive by speed – realtime vs Batch
  9. Let’s Walk through an example of this
  10. We will demo this soon
  11. We will demo this soon
  12. TODO – add logos for Bing Ads, Office365, Delve Analytics
  13. How to monitor all of our resources across subscriptions with single pane of glass? How to Analyze Hadoop Logs & Metrics easily? How to setup alerting?