With major vendors working hard to ease provision of Hadoop, resulting in many Hadoop As A Service offerings; what’s the challenge domain in 2014 for Big Data engineers? If HaaS is a Highway; where does it lead and how do you travel on it?
In this fast paced, L300 hands-on session, Andy will demonstrate Hadoop in practice, using Microsoft’s Cloud technologies: Building a system from scratch to ingest information into HDInsight, query and report on that information.
This session presumes prior knowledge of Map Reduce technologies, Hadoop, HDFS and HCAT.
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Highway to the Information Zone (Andy Cross)
1. Premium community conference on Microsoft technologies itcampro@ itcamp14#
Highway to the Information Zone
Solving 3 key challenges of building Big
Data Solutions in the Cloud
@andybareweb
2. Premium community conference on Microsoft technologies itcampro@ itcamp14#
Huge thanks to our sponsors & partners!
3. Premium community conference on Microsoft technologies itcampro@ itcamp14#
Big Data core ethos: Distribute
workload to achieve throughput
on IO bound operations
Flat files + Compute = Azure
4. Premium community conference on Microsoft technologies itcampro@ itcamp14#
GA managed Hadoop 2 Hadoop on Microsoft
Azure
Familiar tools such as Hive, Pig, Oozie
Additional BoB Microsoft ecosystem tooling
with .net SDK
Powershell and .net for provision
Execution with .net and powershell for Hive
Paired with Hortonworks HDP for on-premises
Hadoop; compatible with all major Hadoop
implementations
Combined with Excel and traditional
Microsoft BI stack for compelling solutions
HDInsight – Hadoop as a Service
5. Premium community conference on Microsoft technologies itcampro@ itcamp14#
Simple Programming style for efficient distribution
A cluster topology designed for resilience and efficiency
What is Hadoop?
MAP REDUCE
Name Node &
Job Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker……
6. Premium community conference on Microsoft technologies itcampro@ itcamp14#
Apply innovative expressions of logic over
stored mass of data
8. Premium community conference on Microsoft technologies itcampro@ itcamp14#
Blank Canvas
• Windows Azure Subscription
– Capacity to provision HDInsight
– Capacity to provision Storage Account
10. Premium community conference on Microsoft technologies itcampro@ itcamp14#
We need somewhere to Execute!
• Powershell / C# / xpat CLI
• All these give further configuration options including
– Boost performance by increasing IOPs – stripe data across many Storage
Accounts
– Manage cluster specific features; core-site, mapred-site and hdfs-site
11. Premium community conference on Microsoft technologies itcampro@ itcamp14#
DEMO
Provision a customised HDInsight cluster via powershell
15. Premium community conference on Microsoft technologies itcampro@ itcamp14#
Shard Data to boost performance
Shard source data across Azure storage accounts, giving over 5000
IOPS per HDInsight cluster
16. Premium community conference on Microsoft technologies itcampro@ itcamp14#
Isolate logs best practice
Use a state storage account for logs, creating automatically at the
same time as cluster creation
18. Premium community conference on Microsoft technologies itcampro@ itcamp14#
• Windows Azure Storage Blobs
– Equivalent to Azure Blob Storage
• Mounted as HDFS compatible file system
– Hadoop can read/write directly with
– Azure Blobs
Explanation of WASB
ANDYC2014
19. Premium community conference on Microsoft technologies itcampro@ itcamp14#
DEMO
File upload to new WASB location; Hadoop fs –cat /path/to/file
20. Premium community conference on Microsoft technologies itcampro@ itcamp14#
In reality you will have a file pipeline; my solution is Cloud
Data Sync Agent