SlideShare une entreprise Scribd logo
1  sur  44
Storage
Azure HDInsight
R Server
Local (HDFS) or Cloud (Azure Blob/Azure Data Lake Store)
Analytics
Azure HDInsight
Hadoop Meets the Cloud
Microsoft’s managed Hadoop as a Service
100% open source Apache Hadoop
Built on the latest releases across Hadoop (2.7)
Up and running in minutes with no hardware to deploy
Run on Windows or Linux
Supported by Microsoft
Rockwell Automation is partnered with one of the six oil and gas
super majors to build unmanned internet-connected gas
dispensers. Each dispenser emits real-time management metrics
allowing them to detect anomalies and predict when proactive
maintenance needs to occur.
Store sensor data every 5 minutes
 Temperature, pressure, vibration, etc.
 Tens of thousands of data points / second
Azure Blobs
Azure HDInsight
Hive, Pig,
Azure SQL DB
Power BI for O365
Mobile Notification Hub
Mobile Device
Real-time notification
JustGiving wanted to harness
the power of their data by using
network science to map people’s
connections and relationships so
that they could connect people
with the causes they care about.
Based on 15
years of data, the JustGiving
GiveGraph is the world’s
largest ecosystem of giving
behavior. It contains more
than 81 million person nodes,
thousands of causes and 285
million connections and is the
engine that drives JustGiving’s
social platform, enabling levels
of personalization and
engagement that a traditional
infrastructure would be unable
to deliver.
SQL Server
On-premises
Agent
Azure Blobs
Azure HDInsight
Give
Graph
Azure Tables
Web APIWebsite +
Event store
Service Bus
Serves results
Azure Cache
Activity
Feeds



*Pending IDC study found on a per TB basis,
Microsoft customers using cloud-based Hadoop
in Data Lake have a 63% lower TCO than on-
premises
Always on cluster Cluster as a service
Storage choice Local HDFS, Azure Blob, Azure
Data Lake Store
Azure Blob, Azure Data Lake Store
Job Scheduling Oozie Azure Data Factory
Data persistence after
cluster deletion
N/A Azure Blob, Azure Data Lake Store
Metadata persistence
after cluster deletion
N/A Azure SQL
Partition 1 Partition 2 Partition 3
2014-10.part0 2014-11.part0 2014-12.part0
2014-10.part1 2014-11.part1 2014-12.part1
2014-10.part2 2014-11.part2 2014-12.part2
2014-10.part3 2014-11.part3 2014-12.part3
Partition 1 Partition 2 Partition 3
2014-10.part0 2014-11.part0 2014-12.part0
2014-10.part1 2014-11.part1 2014-12.part1
2014-10.part2 2014-11.part2 2014-12.part2
2014-10.part3 2014-11.part3 2014-12.part3
Partition 1 Partition 2 Partition 3
2014-10.part0 2014-10.part1 2014-10.part2
2014-11.part0 2014-11.part1 2014-11.part2
2014-12.part0 2014-12.part1
2014-
12.part2
2014-10.part3
2014-11.part3
2014-12.part3
Partition 4
No limits on file sizes
Analytics scale on demand
No code rewrites as you increase size of data stored
Optimized for massive throughput
Optimized for IOT with high volume of small writes
PB
TB GB
PB
TB
Goals
•
•
•
•
Azure Security: Encryption At Rest
Azure Blob Storage (In Preview)
• Encryption @ rest using Microsoft managed keys
• Customers can use Azure Storage configuration to manage
encryption. No HDInsight changes required.
RBAC : Securing HDInsight with Blue Talon (ISV)
Deep integration to Visual Studio
Easy for novices to write simple queries
Robust environment for experts to also be productive
Integrated with Pig, Hive, and Storm
Playback that visualizes performance to identify
bottlenecks and areas for optimization




Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers

Contenu connexe

Tendances

Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureDataWorks Summit
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...DataWorks Summit
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
Overview of stinger interactive query for hive
Overview of stinger   interactive query for hiveOverview of stinger   interactive query for hive
Overview of stinger interactive query for hiveDavid Kaiser
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceBlueData, Inc.
 

Tendances (20)

Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 
Big Data Platform Industrialization
Big Data Platform Industrialization Big Data Platform Industrialization
Big Data Platform Industrialization
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
Big Data Platform Industrialization
Big Data Platform Industrialization Big Data Platform Industrialization
Big Data Platform Industrialization
 
Hadoop Everywhere
Hadoop EverywhereHadoop Everywhere
Hadoop Everywhere
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Empower Data-Driven Organizations
Empower Data-Driven OrganizationsEmpower Data-Driven Organizations
Empower Data-Driven Organizations
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Overview of stinger interactive query for hive
Overview of stinger   interactive query for hiveOverview of stinger   interactive query for hive
Overview of stinger interactive query for hive
 
ebay
ebayebay
ebay
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 

En vedette

Using a Data Lake at the core of a Life Assurance business
Using a Data Lake at the core of a Life Assurance businessUsing a Data Lake at the core of a Life Assurance business
Using a Data Lake at the core of a Life Assurance businessDataWorks Summit/Hadoop Summit
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudDataWorks Summit/Hadoop Summit
 
Pillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePete Kisich
 
certificate 100 best graduates
certificate 100 best graduatescertificate 100 best graduates
certificate 100 best graduatesToma Gaidyte
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Cloudera, Inc.
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloudSteve Loughran
 
Why PTC for SLM?
Why PTC for SLM?Why PTC for SLM?
Why PTC for SLM?Tom Kenslea
 
Cloud- A Technical or Organisational Challenge? Or Both?
Cloud- A Technical or Organisational Challenge? Or Both?Cloud- A Technical or Organisational Challenge? Or Both?
Cloud- A Technical or Organisational Challenge? Or Both?Justin Pirie
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature DataWorks Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Designing the Industrial Internet
Designing the Industrial InternetDesigning the Industrial Internet
Designing the Industrial InternetDane Petersen
 
Meeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop ClustersMeeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop ClustersDataWorks Summit/Hadoop Summit
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? DataWorks Summit
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!DataWorks Summit/Hadoop Summit
 

En vedette (20)

Using a Data Lake at the core of a Life Assurance business
Using a Data Lake at the core of a Life Assurance businessUsing a Data Lake at the core of a Life Assurance business
Using a Data Lake at the core of a Life Assurance business
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
Pillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS Storage
 
certificate 100 best graduates
certificate 100 best graduatescertificate 100 best graduates
certificate 100 best graduates
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
Why PTC for SLM?
Why PTC for SLM?Why PTC for SLM?
Why PTC for SLM?
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Cloud- A Technical or Organisational Challenge? Or Both?
Cloud- A Technical or Organisational Challenge? Or Both?Cloud- A Technical or Organisational Challenge? Or Both?
Cloud- A Technical or Organisational Challenge? Or Both?
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Designing the Industrial Internet
Designing the Industrial InternetDesigning the Industrial Internet
Designing the Industrial Internet
 
Meeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop ClustersMeeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop Clusters
 
Retaam_ThingWorx
Retaam_ThingWorxRetaam_ThingWorx
Retaam_ThingWorx
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Data Process Systems, connecting everything
Data Process Systems, connecting everythingData Process Systems, connecting everything
Data Process Systems, connecting everything
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 

Similaire à Hadoop in the Cloud: Real World Lessons from Enterprise Customers

Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStackSandeep Raju
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshopFang Mac
 
Big Data Infrastructure
Big Data InfrastructureBig Data Infrastructure
Big Data InfrastructureTrivadis
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsAlluxio, Inc.
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesDataWorks Summit
 
Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3Alluxio, Inc.
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Carole Gunst
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudAlluxio, Inc.
 
Hadoop and CLOUDIAN HyperStore
Hadoop and CLOUDIAN HyperStoreHadoop and CLOUDIAN HyperStore
Hadoop and CLOUDIAN HyperStoreCLOUDIAN KK
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudShubham Tagra
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Big Data Joe™ Rossi
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 

Similaire à Hadoop in the Cloud: Real World Lessons from Enterprise Customers (20)

Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStack
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Big Data Infrastructure
Big Data InfrastructureBig Data Infrastructure
Big Data Infrastructure
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
 
Hadoop and CLOUDIAN HyperStore
Hadoop and CLOUDIAN HyperStoreHadoop and CLOUDIAN HyperStore
Hadoop and CLOUDIAN HyperStore
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 

Plus de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Plus de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Dernier

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 

Dernier (20)

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 

Hadoop in the Cloud: Real World Lessons from Enterprise Customers

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Storage Azure HDInsight R Server Local (HDFS) or Cloud (Azure Blob/Azure Data Lake Store) Analytics
  • 7. Azure HDInsight Hadoop Meets the Cloud Microsoft’s managed Hadoop as a Service 100% open source Apache Hadoop Built on the latest releases across Hadoop (2.7) Up and running in minutes with no hardware to deploy Run on Windows or Linux Supported by Microsoft
  • 8.
  • 9. Rockwell Automation is partnered with one of the six oil and gas super majors to build unmanned internet-connected gas dispensers. Each dispenser emits real-time management metrics allowing them to detect anomalies and predict when proactive maintenance needs to occur. Store sensor data every 5 minutes  Temperature, pressure, vibration, etc.  Tens of thousands of data points / second Azure Blobs Azure HDInsight Hive, Pig, Azure SQL DB Power BI for O365 Mobile Notification Hub Mobile Device Real-time notification
  • 10. JustGiving wanted to harness the power of their data by using network science to map people’s connections and relationships so that they could connect people with the causes they care about. Based on 15 years of data, the JustGiving GiveGraph is the world’s largest ecosystem of giving behavior. It contains more than 81 million person nodes, thousands of causes and 285 million connections and is the engine that drives JustGiving’s social platform, enabling levels of personalization and engagement that a traditional infrastructure would be unable to deliver. SQL Server On-premises Agent Azure Blobs Azure HDInsight Give Graph Azure Tables Web APIWebsite + Event store Service Bus Serves results Azure Cache Activity Feeds
  • 12.
  • 13.
  • 14. *Pending IDC study found on a per TB basis, Microsoft customers using cloud-based Hadoop in Data Lake have a 63% lower TCO than on- premises
  • 15.
  • 16.
  • 17.
  • 18. Always on cluster Cluster as a service Storage choice Local HDFS, Azure Blob, Azure Data Lake Store Azure Blob, Azure Data Lake Store Job Scheduling Oozie Azure Data Factory Data persistence after cluster deletion N/A Azure Blob, Azure Data Lake Store Metadata persistence after cluster deletion N/A Azure SQL
  • 19.
  • 20.
  • 21. Partition 1 Partition 2 Partition 3 2014-10.part0 2014-11.part0 2014-12.part0 2014-10.part1 2014-11.part1 2014-12.part1 2014-10.part2 2014-11.part2 2014-12.part2 2014-10.part3 2014-11.part3 2014-12.part3
  • 22. Partition 1 Partition 2 Partition 3 2014-10.part0 2014-11.part0 2014-12.part0 2014-10.part1 2014-11.part1 2014-12.part1 2014-10.part2 2014-11.part2 2014-12.part2 2014-10.part3 2014-11.part3 2014-12.part3
  • 23. Partition 1 Partition 2 Partition 3 2014-10.part0 2014-10.part1 2014-10.part2 2014-11.part0 2014-11.part1 2014-11.part2 2014-12.part0 2014-12.part1 2014- 12.part2 2014-10.part3 2014-11.part3 2014-12.part3 Partition 4
  • 24. No limits on file sizes Analytics scale on demand No code rewrites as you increase size of data stored Optimized for massive throughput Optimized for IOT with high volume of small writes PB TB GB PB TB
  • 25. Goals
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 32.
  • 33. Azure Security: Encryption At Rest Azure Blob Storage (In Preview) • Encryption @ rest using Microsoft managed keys • Customers can use Azure Storage configuration to manage encryption. No HDInsight changes required.
  • 34. RBAC : Securing HDInsight with Blue Talon (ISV)
  • 35.
  • 36. Deep integration to Visual Studio Easy for novices to write simple queries Robust environment for experts to also be productive Integrated with Pig, Hive, and Storm Playback that visualizes performance to identify bottlenecks and areas for optimization

Notes de l'éditeur

  1. Challenge Manage sites used for dispensing liquefied natural gas (clean fuel for commercial customers who do heavy-duty road transportation) Built LNG refueling stations across US interstate highway Stations are unmanned so they built 24x7 remote management and monitoring to track diagnostics of each station for maintenance or tuning Built internet-connected sensors embedded in 350 dispenser sites worldwide generating tens of thousands data points per second Temperature, pressure, vibration, etc. Data needs outgrew company’s internal datacenter and data warehouse Solution Chose Azure HDInsight, Data Factory, SQL Database Dashboards used to detect anomalies for proactive maintenance Changes in performance of the components Energy consumption of components  Component downtime and reliability  Future: Goal is to expand program to hundreds of thousands of dispensers How They Did It Collect data from internet-collected sensors Tens of thousands data points per second Interpolate time-series prior to analysis Stored raw sensor data in Blobs every 5 minutes Use Hadoop to execute scripts and Data Factory to orchestrate Hive and Pig scripts orchestrated by Data Factory Data resulting from scripts loaded in SQL Database Queries detect site anomalies to indicate maintenance/tuning Produced dashboards with role-based reporting Azure Machine Learning , SSRS, Power BI for O365 Provide users with customizable interface View current and historical data (day-to-day operations, asset performance over time, etc.) Leveraged Azure Mobile Notification Hub for real-time notifications, alarms, or important events
  2. JustGiving wanted to identify what was personal and relevant to people and what they cared about, so that they could suggest further causes that may inspire continual involvement. However, with 22 million customers this meant storing and processing huge amounts of data that their existing infrastructure simply couldn’t support. HDInsight, provided the scalable, ‘on-demand’ processing and analysis ability to assist JustGiving with its goal to constantly evolve the personalised experiences it provides to customers. JustGiving is a global online social platform for giving. It's a financial service (not a charity) that lets you "raise money for a cause you care about" through your network of friends. JustGiving's goal is to become "Facebook of Giving" JG preferred not to refer to Facebook as a way of describing themselves – they are using the term “social giving” and like to refer to themselves as a “tech for good” company. i.e. harness the effect to make charity a group activity which isn't just a onetime event but rather something you stay in touch with on regular basis.   More details on charity goals in a blog post from JustGiving: http://blog.justgiving.com/creating-smart-targets-to-help-fundraisers-raise-more/ Technical Details:   Workflow: There is one a set of  daily HDInsight job that uses the data coming through SQL Server to build out the social graph and provides activity recommendations to the user. The input data is 20-30 GB/job but the output is in 'hundreds' of GBs as relationships are de-normalized/expanded. Azure Table Storage is used to serve 'News Feed' to users. The data in Table store come from two main sources: Real time activity feeds/events coming in from Azure Service Bus. (~50 events/second) Activity recommendation coming out of the daily HDInsight job There are several MR processes to create the graph; once that is done, further jobs create the denormalised activity feeds for all users. M/R jobs that do all the graph building.
  3. Wind turbines – used in windmills to turn wind into power   These things are phenomenally complex devices They emit a ton of telemetry Every 25 milliseconds they emit 10 interesting datapoints 100x of wind mills (turbines) per windfarm They are managing about a hundred windfarms through their customers Their customers are the power companies SO yes this is a “IOT scenario” Couple of challenges How am I going to collect all this information – windfarms around the globe, some in remote areas, how to collect How to I store it cheaply – lot of data – honestly no single row interesting – only interesting in aggregate – but because there’s ns much of it putting in DW it is cost prohibitive Instead they landed data in cloud from around the world Cloud allowed storing it cheaply – just store no processing Then bring processing on top of it to distill and refine the data into something more powerful They cooked data down - refined, aggregate, duplicate detection Now they provided interesting metrics to power companies But since they had the source data around they can ask lots of interesting questions on *all* the data since they started collecting it Moved beyond charts and graphs bout how the windfarms were performing Now able to mash that together with maintenance data  To be able to say what are some of the things correlated with my turbines failing These are very expensive pieces of machinery so if we can find things that are going awry *before* the whole thing breaks its usually much cheaper Now one of the interesting things they found was if they were more observant about dust conditions .. they were saving million dollar generators by replacing four dollar filters more often And so this is a interesting example of predictive maintenance
  4. Choose VM based on core size and memory
  5. Choose VM based on core size and memory
  6. Choose VM based on core size and memory
  7. Finding the right tools to design and tune your big data queries can be difficult. Data Lake makes it easy through deep integration with Visual Studio, so that you can use familiar tools to run, debug, and tune your code. Visualizations of your U-SQL, Hive, and Storm jobs let you see how your code runs at scale and identify performance bottlenecks and cost optimizations, making it easier to tune your queries.