SlideShare une entreprise Scribd logo
1  sur  71
Télécharger pour lire hors ligne
Big Data Platform Building Blocks: Serengeti,
Resource Management,
and Virtualization Extensions
Abhishek Kashyap, Pivotal
Kevin Leong, VMware
VAPP5762
#VAPP5762
22
Agenda
 Big Data, Hadoop, and What It Means to You
 The VMware Big Data Platform
• Operate Clusters Simply
• Share Infrastructure Efficiently
• Leverage Existing Investment
 Pivotal and VMware: Partnering to Virtualize Hadoop
 Conclusion and Q&A
33
Big Data, Hadoop, and
What It Means to You
44
What is Hadoop?
 Framework that allows for distributed processing of large data sets
across clusters of commodity servers
• Store large amount of data
• Process the large amount of data stored
 Inspired by Google’s MapReduce and Google File System (GFS)
papers
 Apache Open Source Project
• Initial work done at Yahoo! starting in 2005
• Open sourced in 2009 there is now a very active open source community
55
What is Hadoop?
 Storage & Compute in One Framework
 Open Source Project of the Apache Software Foundation
 Java-intensive programming required
HDFS MapReduce
Two Core Components
Scalable storage in
Hadoop Distributed
File System
Compute via the
MapReduce distributed
processing platform
66
Why Hadoop?
 HDFS provides cheap and reliable storage on commodity hardware
 In-place data analysis, rather than moving from file systems to data
warehouses
 Ability to analyze structured and unstructured data
Enables better business decisions from more types of data at
higher speeds and lower costs
77
Use Case: Data Warehouse Augmentation / Offload
 Challenges
• Existing EDW used for low value and resource consuming ETL process
• Planned growth will far exceed compute capacity
• Hard to do analytics or even basic reporting on EDW system
 Objectives
• Reduce EDW Total Cost of Ownership
• Enable longer data retention to enable analytics and accelerate time to market
• Migrate ETL off EDW to free up compute resources
88
Use Case: Retailer Trend Analysis
 Deep Historical Reporting for Retail Trends:
• Credit card company loads 10 years of data for all retailers (100’s of TB’s)
• Run Map/Reduce Job develop historical picture of retailers in a specific area
• Load results from Hadoop into data warehouse and further analyze with
standard BI/statistics packages
 Why do this in Hadoop?
• Ability to store years of data cost effectively
• Data available for immediate recall (not on tapes or flat files)
• No need to ETL/normalize the data
• Data exists in its valuable, original format
• Offload intensive computation from DW
• Ability to combine structured and unstructured data
99
Pivotal HD
HDFS
HBase
Pig, Hive,
Mahout
Map Reduce
Sqoop Flume
Resource
Management
& Workflow
Yarn
Zookeeper
Apache
1010
Pivotal HD
HDFS
HBase
Pig, Hive,
Mahout
Map Reduce
Sqoop Flume
Resource
Management
& Workflow
Yarn
Zookeeper
Deploy,
Configure,
Monitor,
Manage
Command
Center
Hadoop Virtualization (HVE)
Data Loader
Pivotal HD
Enterprise
Apache Pivotal HD Enterprise
1111
Pivotal HD
HDFS
HBase
Pig, Hive,
Mahout
Map Reduce
Sqoop Flume
Resource
Management
& Workflow
Yarn
Zookeeper
Deploy,
Configure,
Monitor,
Manage
Command
Center
Hadoop Virtualization (HVE)
Data Loader
Pivotal HD
Enterprise
Apache Pivotal HD Enterprise HAWQ
Xtension
Framework
Catalog
Services
Query
Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ– Advanced
Database Services
1212
Pivotal HD
HDFS
HBase
Pig, Hive,
Mahout
Map Reduce
Sqoop Flume
Resource
Management
& Workflow
Yarn
Zookeeper
Deploy,
Configure,
Monitor,
Manage
Command
Center
Data Loader
Pivotal HD
Enterprise
Apache Pivotal HD Enterprise HAWQ
Xtension
Framework
Catalog
Services
Query
Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ– Advanced
Database Services
Spring XD
Pivotal Analytics
Pivotal
Chorus & Alpine Miner
MoreVRP
Hadoop Virtualization (HVE)
1313
The VMware Big Data Platform
1414
The Big Data Journey in the Enterprise
Stage 3: Cloud Analytics Platform
• Serve many departments
• Often part of mission critical workflow
• Fully integrated with analytics/BI tools
Stage1: Hadoop Piloting
• Often start with line of business
• Try 1 or 2 use cases to explore
the value of Hadoop
Stage 2: Hadoop Production
• Serve a few departments
• More use cases
• Growing # and size of clusters
• Core Hadoop + components
10’s 100’s0 node
Integrated
Scale
Standalone
1515
Getting from Here to There
Host Host Host Host Host Host Host
Virtualization
Shared File SystemData
Layer
Compute
Layer
Hadoop
test/dev
1616
Getting from Here to There
Host Host Host Host Host Host Host
Virtualization
Shared File SystemData
Layer
Compute
Layer
Hadoop
test/dev
Hadoop
production
Hadoop
production
Hadoop
experimentation
1717
Getting from Here to There
Host Host Host Host Host Host Host
Virtualization
Shared File SystemData
Layer
Compute
Layer
Hadoop
test/dev
HBase
Hadoop
production
SQL on Hadoop
HAWQ, Impala, Drill
NoSQL
Cassandra
Mongo
Other
Spark
Shark
Solr
Platfora
1818
Benefits of Virtualization at Each Stage
Stage 3: Cloud Analytics Platform
 Mixed workloads
 Right tool at the right time
 Flexible and elastic infrastructure
Stage1: Hadoop Piloting
 Rapid deployment
 On the fly cluster resizing
 Flexible config
 Automation of cluster lifecycle
Stage 2: Hadoop Production
 High Availability
 Consolidation
 Tiered SLAs
 Elastic Scaling
10’s 100’s0 node
Integrated
Scale
Standalone
1919
A Brief History of Project Serengeti
(and Big Data at VMware)
2020
Big Data Initiatives at VMware
Serengeti
vSphere
Resource
Management
Hadoop
Virtualization
Extensions
 Virtualization changes
for core Hadoop
 Contributed back to
Apache Hadoop
 Advanced resource
management on vSphere
 Big Data applications-specific
extension to DRS
 Open source project
 Tool to simplify virtualized Hadoop
deployment & operations
2121
Clustered Workload Management: The Next Frontier
ESXi
Serengeti
Hadoop
Management
Virtualization
vCenter
Source: http://www.conferencebike.com/image/generated/792.png
2222
Serengeti Project History
Serengeti 0.5
June 2012
Serengeti 0.6
August 2012
Serengeti 0.7
October 2012
Serengeti 0.8
April 2013
• Hadoop in
10 min
• Highly
Available
Hadoop
• Time to
insight
• Configuring
Hadoop
• Compute
elasticity
• Configuring
placement
and topology
• HBase
• MapR
• CDH4
• Performance
best
practices
Serengeti 0.9/
BDE Beta
June 2013
• Integrated
GUI
• Automatic
elasticity
• YARN/
Pivotal HD
2323
Big Data Extensions: Serengeti-vCenter Integration
ESXi
Hadoop
Management
Virtualization
Big Data Extensions + vCenter
2424
Why Virtualize Hadoop
Operate
Clusters
Simply
Share
Infrastructure
Efficiently
Leverage
Existing
Investment
2525
Operate Clusters Simply
Serengeti
2626
What Does Nick Think About Hadoop?
I don’t want to be the
bottleneck when it
comes to provisioning
Hadoop clusters
I need sizing flexibility,
because my Hadoop users
don’t know how large of a
cluster they need
I want to establish a
repeatable process for
deploying Hadoop
clusters
I don’t really know
that much about
Hadoop
I want to better manage
the jumble of LOB
Hadoop clusters in my
enterprise
Source: http://www.smartdraw.com/solutions/information-technology/images/nick.png
2727
Choose Your Own Adventure
Source: http://www.vintagecomputing.com/wp-content/images/retroscan/supercomputer_cyoa_large.jpg
2828
Big Data Extensions Demo
2929
Deploy Hadoop Clusters in Minutes
Hadoop Installation and
Configuration
Network Configuration
OS installation
Server preparation
From manual process To fully automated, using the GUI
3030
How It Works
 BDE is packaged as a virtual appliance, which can be easily
deployed on vCenter
 BDE works as a vCenter extension and establishes SSL connection
with vCenter
 BDE clones VMs from the template and controls/configures VMs
through vCenter
Host Host Host Host Host
Virtualization Platform
Hadoop
Node
Hadoop
Node
vCenter
Management
Server
Template
Virtual Appliance VM Cloning
3131
User-specified Customizations Using Cluster Specification File
Storage configuration
Choice of shared or local
High Availability option
Number of nodes and
resource configuration
VM placement policies
3232
Deploy
Customize
Load data
Execute
jobs
Tune
configuration
Scale
…
Deploy, Manage, Run Virtual Hadoop with BDE
3333
Agility and Operational Simplicity
Host Host Host Host Host Host Host
Virtualization
Shared File SystemData
Layer
Compute
Layer
Hadoop
test/dev
3434
Share Infrastructure Efficiently
vSphere Resource Management
3535
Adult Supervision Required
CLUSTERS OVER
10 NODES
3636
Production
Test
Experimentation
Dept A: recommendation engine Dept B: ad targeting
Production
Test
Experimentation
Log files
Social dataTransaction data Historical cust behavior
Pain Points:
1. Cluster sprawl
2. Redundant common data in
separate clusters
3. Inefficient use of resources. Some
clusters could be running at
capacity while other clusters are
sitting idle
NoSQL Real time SQL …
On the horizon…
Challenges of Running Hadoop in the Enterprise
3737
The Virtualization Advantage
Experimentation
Production
Recommendation Engine
Production
Ad Targeting
Test/Dev
Production
Test
Production
Test
Experimentation
Recommendation engine Ad targeting
Experimentation
One physical platform to support
multiple virtual big data clusters
3838
What Other Things Does Nick Think About Hadoop?
Source: http://www.smartdraw.com/solutions/information-technology/images/nick.png
I want to scale out
when my workload
requires it
My Hadoop users
ask for large
Hadoop clusters,
which end up
underutilized
I want to offer
Hadoop-as-a-
Service in my
private cloud
I want to get all
Hadoop clusters
into a centralized
environment to
minimize spend
3939
Achieving Multi-tenancy
 Resource Isolation
• Control the greedy noisy neighbor
• Reserve resources to meet needs
 Version Isolation
• Allow concurrent OS, App, Distro versions
 Security Isolation
• Provide privacy between users/groups
• Runtime and data privacy required
Host Host Host Host Host Host
VMware vSphere + Serengeti
Host
4040
Combined
Storage/
Compute
VM
Hadoop in VM
 VM lifecycle
determined
by Datanode
 Limited elasticity
 Limited to Hadoop
Multi-Tenancy
Storage
Compute
VM
VM
Separate Storage
 Separate compute
from data
 Elastic compute
 Enable shared
workloads
 Raise utilization
Storage
T1 T2
VM
VM
VM
Separate Compute Tenants
 Separate virtual clusters
per tenant
 Stronger VM-grade security
and resource isolation
 Enable deployment of
multiple Hadoop runtime
versions
Slave Node
Separating Hadoop Data and Compute for Elasticity
4141
Dynamic Hadoop Scaling
 Deploy separate compute clusters for different tenants
sharing HDFS
 Commission/decommission compute nodes according to priority
and available resources
ExperimentationDynamic resourcepool
Data layer
Production
recommendation engine
Compute layer Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Experimentation Production
Compute
VM
Job
Tracker
Job
Tracker
VMware vSphere + Serengeti
4242
Elastic Hadoop Demo
4343
State, stats
(Slots used,
Pending work)
Commands
(Decommission,
Recommission)
Stats and VM configuration
Serengeti Job
Tracker
vCenter DB
Manual/Auto
Power on/off
Virtual Hadoop Manager (VHM)
Job
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
vCenter Server
Serengeti
Configuration
VC
state and stats
Hadoop
state and stats
VC
actions
Hadoop
actions
Algorithms
Cluster
Configuration
Resource Management Module
4444
Combining Elasticity and Multi-tenancy
Host Host Host Host Host Host Host
Virtualization
Shared File SystemData
Layer
Compute
Layer
Hadoop
test/dev
Hadoop
production
Hadoop
production
Hadoop
experimentation
4545
Leverage Existing Investment
4646
What Is Nick Still Thinking About Hadoop?
Source: http://www.smartdraw.com/solutions/information-technology/images/nick.png
I want to use my
existing
infrastructure, not
buy new hardware
I want to leverage
the tools I already
have
Hadoop on
Amazon is costing
too much
My data is in
shared storage; do
I have to move it?
I want a low-risk
way of trying
Hadoop
4747
Use Storage That Meets Your Needs
SAN Storage
$2 - $10/Gigabyte
$1M gets:
0.5 Petabytes
200,000 IOPS
8Gbyte/sec
NAS Filers
$1 - $5/Gigabyte
$1M gets:
1 Petabyte
200,000 IOPS
10Gbyte/sec
Local Storage
$0.05/Gigabyte
$1M gets:
10 Petabytes
400,000 IOPS
250 Gbytes/sec
4848
Leveraging Isilon as External HDFS
 Time to results: Analysis of data in place
 Lower risk using vSphere with Isilon
 Scale storage and compute independently
Data Layer – Hadoop on Isilon
Elastic Virtual Compute Layer
4949
Hybrid Storage Model to Get the Best of Both Worlds
 Master nodes:
• NameNode, JobTracker on
shared storage
• Leverage vSphere vMotion, HA
and FT
 Slave nodes
• TaskTracker, DataNode on local
storage
• Lower cost, scalable bandwidth
Local StorageShared Storage
5050
Achieving HA for the Entire Hadoop Stack
 Battle-tested HA technology
 Single mechanism to achieve HA for the entire Hadoop stack
 Simple to enable HA/FT
HDFS
(Hadoop Distributed File System)
HBase (Key-Value store)
MapReduce (Job Scheduling/Execution System)
Pig (Data Flow) Hive (SQL)
BI ReportingETL Tools
ManagementServer
Zookeepr(Coordination)
HCatalog
RDBMS
Namenode
Jobtracker
Hive MetaDB Hcatalog MDB
Server
5151
Leveraging Other VMware Assets
 Monitoring with vCenter Operations Manager
• Gain comprehensive visibility
• Eliminate manual processes with intelligent automation
• Proactively manage operations
 Future: vCloud Automation Center, Software-defined Storage
5252
Get Maximum Value from Existing Tools and Infrastructure
Host Host Host Host Host Host Host
Virtualization
Shared File SystemData
Layer
Compute
Layer
Hadoop
test/dev
HBase
Hadoop
production
SQL on Hadoop
HAWQ, Impala, Drill
NoSQL
Cassandra
Mongo
Other
Spark
Shark
Solr
Platfora
5353
Pivotal and VMware:
Partnering to Virtualize Hadoop
5454
Virtualization Benefits
 Multi-tenancy (users, business units) with strong vSphere-based
isolation
 Multiple big data applications and compute engines can access
common HDFS data
 Agility to scale Hadoop nodes at run-time
 Provide On-Demand Hadoop / Hadoop as a Service
5555
Busting Myths About Virtual Hadoop
Virtualization will add significant
performance overhead
Virtual Hadoop performance
is comparable to bare metal
Hadoop cannot work
with shared storage
Shared storage is a valid choice,
especially for smaller clusters
Virtualization necessitates
the use of shared storage
Shared storage is useful for HA, but
virtual Hadoop on DAS is very common
Hadoop distribution vendors don’t
support virtual implementations
Pivotal HD is jointly tested, certified, and
supported on vSphere
Source: http://www.psychologytoday.com/files/u637/good-grief-charlie-brown.jpg, http://images2.wikia.nocookie.net/__cb20101130042247/peanuts/images/6/6d/Joe-cool-1-.jpg
5656
Native versus Virtual Platforms, 32 hosts, 16 disks/host
Source: http://www.vmware.com/resources/techresources/10360
5757
Harness the Flexibility of Virtualization
Hadoop Virtualization Extensions
5858
You Need Hadoop Virtual Extensions
 Topology Extensions:
• Enable Hadoop to recognize additional virtualization layer for
read/write/balancing for proper replica placement
• Enable compute/data node separation without losing locality
 Elasticity Extensions:
• Ability to dynamically adjust resources allocated (CPU, memory, map/reduce
slots) to compute nodes
• Enables runtime elasticity of Hadoop nodes
5959
Hadoop Virtual Extensions
 Topology Extensions:
• Enable Hadoop to recognize additional virtualization layer for
read/write/balancing for proper replica placement
• Enable compute/data node separation without losing locality
 Elasticity Extensions:
• Ability to dynamically adjust resources allocated (CPU, memory, map/reduce
slots) to compute nodes
• Enables runtime elasticity of Hadoop nodes
6060
Current Hadoop Network Topology Not Virtualization Aware
H1 H2 H3
R1
H4 H5 H6
R2
H7 H8 H9
R3
H10 H11 H12
R4
D1 D1
/
• D = data center
• R = rack
• H = host
 Multiple replicas may end up on same Physical Host in Virtual
environments
6161
HVE Adds a New Layer in Hadoop Network Topology
• D = data center
• R = rack
• NG = node group
• HG = node
N13N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12
R1 R2 R3 R4
D1 D2
/
NG1 NG2 NG3 NG4 NG5 NG6 NG7 NG8
6262
“Virtualization Aware” Replica Placement Policy During Write
Updated Policies:
• No replicas are placed on the
same node or nodes under
the same node group
• 1st replica is on the local
node or one of nodes under
the same node group of the
writer
• 2nd replica is on a remote
rack of the 1st replica
• 3rd replica is on the same
rack as the 2nd replica
• Remaining replicas are
placed randomly across rack
to meet minimum restriction
6363
Hadoop Virtual Extensions
 Topology Extensions:
• Enable Hadoop to recognize additional virtualization layer for
read/write/balancing for proper replica placement
• Enable compute/data node separation without losing locality
 Elasticity Extensions:
• Ability to dynamically adjust resources allocated (CPU, memory, map/reduce
slots) to compute nodes
• Enables runtime elasticity of Hadoop nodes
6464
HVE Achieves Vertical Scaling of Hadoop Nodes
 VM’s boundary is elastic already
• VM resource type: reserved (low limit) and maximum (up limit)
• If resource is tight, VMs compete for resource (between reserved and
maximum) based on shares
• “Stealing” resources without notifying Apps sometimes cause very bad
performance
• Thus, need to figure out a way to make app-aware resource change
 Current Hadoop resource schedulers are static
• MRV1 – slots
• YARN – resources (Memory for now, YARN-2 will include CPUs)
 HVE Elasticity patches
• Enable flexible resource model for each
Hadoop node
• Change resources at runtime
6565
Pivotal HD is the Best Suited for Virtualization
 Only distribution that ships with VMware Hadoop Virtual
Extensions (HVE)
• Fully tested
• Ensures proper HDFS replication placement on vSphere
• Improves MapReduce performance through better data locality on vSphere
• Allows dynamic scaling of Hadoop Compute Nodes
 Certified on vSphere
 VMware Serengeti deploys and scales Pivotal HD on vSphere out-
of-box
• Only YARN based distribution supported by Serengeti
6666
Conclusion
6767
Big Data Platform Building Blocks and Key Benefits
Serengeti
vSphere
Resource
Management
Hadoop
Virtualization
Extensions
Partnership
6868
Thank You
projectserengeti.org
gopivotal.com/pivotal-products/pivotal-data-fabric/pivotal-hd
Kevin Leong
kleong@vmware.com
Abhishek Kashyap
akashyap@gopivotal.com
THANK YOU
Big Data Platform Building Blocks: Serengeti,
Resource Management,
and Virtualization Extensions
Abhishek Kashyap, Pivotal
Kevin Leong, VMware
VAPP5762
#VAPP5762

Contenu connexe

Tendances

YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureUtkarsh Pandey
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
 
Overview of stinger interactive query for hive
Overview of stinger   interactive query for hiveOverview of stinger   interactive query for hive
Overview of stinger interactive query for hiveDavid Kaiser
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezJan Pieter Posthuma
 
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityWhat the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityCloudera, Inc.
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in HadoopRommel Garcia
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetupiwrigley
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...DataWorks Summit
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 

Tendances (20)

YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azure
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
Overview of stinger interactive query for hive
Overview of stinger   interactive query for hiveOverview of stinger   interactive query for hive
Overview of stinger interactive query for hive
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityWhat the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and Visibility
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Interactive query using hadoop
Interactive query using hadoopInteractive query using hadoop
Interactive query using hadoop
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 

Similaire à VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Management, and Virtualization Extensions

Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.OW2
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
 

Similaire à VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Management, and Virtualization Extensions (20)

Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 

Plus de VMworld

VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld
 
VMworld 2016: Troubleshooting 101 for Horizon
VMworld 2016: Troubleshooting 101 for HorizonVMworld 2016: Troubleshooting 101 for Horizon
VMworld 2016: Troubleshooting 101 for HorizonVMworld
 
VMworld 2016: Advanced Network Services with NSX
VMworld 2016: Advanced Network Services with NSXVMworld 2016: Advanced Network Services with NSX
VMworld 2016: Advanced Network Services with NSXVMworld
 
VMworld 2016: How to Deploy VMware NSX with Cisco Infrastructure
VMworld 2016: How to Deploy VMware NSX with Cisco InfrastructureVMworld 2016: How to Deploy VMware NSX with Cisco Infrastructure
VMworld 2016: How to Deploy VMware NSX with Cisco InfrastructureVMworld
 
VMworld 2016: Enforcing a vSphere Cluster Design with PowerCLI Automation
VMworld 2016: Enforcing a vSphere Cluster Design with PowerCLI AutomationVMworld 2016: Enforcing a vSphere Cluster Design with PowerCLI Automation
VMworld 2016: Enforcing a vSphere Cluster Design with PowerCLI AutomationVMworld
 
VMworld 2016: What's New with Horizon 7
VMworld 2016: What's New with Horizon 7VMworld 2016: What's New with Horizon 7
VMworld 2016: What's New with Horizon 7VMworld
 
VMworld 2016: Virtual Volumes Technical Deep Dive
VMworld 2016: Virtual Volumes Technical Deep DiveVMworld 2016: Virtual Volumes Technical Deep Dive
VMworld 2016: Virtual Volumes Technical Deep DiveVMworld
 
VMworld 2016: Advances in Remote Display Protocol Technology with VMware Blas...
VMworld 2016: Advances in Remote Display Protocol Technology with VMware Blas...VMworld 2016: Advances in Remote Display Protocol Technology with VMware Blas...
VMworld 2016: Advances in Remote Display Protocol Technology with VMware Blas...VMworld
 
VMworld 2016: The KISS of vRealize Operations!
VMworld 2016: The KISS of vRealize Operations! VMworld 2016: The KISS of vRealize Operations!
VMworld 2016: The KISS of vRealize Operations! VMworld
 
VMworld 2016: Getting Started with PowerShell and PowerCLI for Your VMware En...
VMworld 2016: Getting Started with PowerShell and PowerCLI for Your VMware En...VMworld 2016: Getting Started with PowerShell and PowerCLI for Your VMware En...
VMworld 2016: Getting Started with PowerShell and PowerCLI for Your VMware En...VMworld
 
VMworld 2016: Ask the vCenter Server Exerts Panel
VMworld 2016: Ask the vCenter Server Exerts PanelVMworld 2016: Ask the vCenter Server Exerts Panel
VMworld 2016: Ask the vCenter Server Exerts PanelVMworld
 
VMworld 2016: Virtualize Active Directory, the Right Way!
VMworld 2016: Virtualize Active Directory, the Right Way! VMworld 2016: Virtualize Active Directory, the Right Way!
VMworld 2016: Virtualize Active Directory, the Right Way! VMworld
 
VMworld 2016: Migrating from a hardware based firewall to NSX to improve perf...
VMworld 2016: Migrating from a hardware based firewall to NSX to improve perf...VMworld 2016: Migrating from a hardware based firewall to NSX to improve perf...
VMworld 2016: Migrating from a hardware based firewall to NSX to improve perf...VMworld
 
VMworld 2015: Troubleshooting for vSphere 6
VMworld 2015: Troubleshooting for vSphere 6VMworld 2015: Troubleshooting for vSphere 6
VMworld 2015: Troubleshooting for vSphere 6VMworld
 
VMworld 2015: Monitoring and Managing Applications with vRealize Operations 6...
VMworld 2015: Monitoring and Managing Applications with vRealize Operations 6...VMworld 2015: Monitoring and Managing Applications with vRealize Operations 6...
VMworld 2015: Monitoring and Managing Applications with vRealize Operations 6...VMworld
 
VMworld 2015: Advanced SQL Server on vSphere
VMworld 2015: Advanced SQL Server on vSphereVMworld 2015: Advanced SQL Server on vSphere
VMworld 2015: Advanced SQL Server on vSphereVMworld
 
VMworld 2015: Virtualize Active Directory, the Right Way!
VMworld 2015: Virtualize Active Directory, the Right Way!VMworld 2015: Virtualize Active Directory, the Right Way!
VMworld 2015: Virtualize Active Directory, the Right Way!VMworld
 
VMworld 2015: Site Recovery Manager and Policy Based DR Deep Dive with Engine...
VMworld 2015: Site Recovery Manager and Policy Based DR Deep Dive with Engine...VMworld 2015: Site Recovery Manager and Policy Based DR Deep Dive with Engine...
VMworld 2015: Site Recovery Manager and Policy Based DR Deep Dive with Engine...VMworld
 
VMworld 2015: Building a Business Case for Virtual SAN
VMworld 2015: Building a Business Case for Virtual SANVMworld 2015: Building a Business Case for Virtual SAN
VMworld 2015: Building a Business Case for Virtual SANVMworld
 
VMworld 2015: Explaining Advanced Virtual Volumes Configurations
VMworld 2015: Explaining Advanced Virtual Volumes ConfigurationsVMworld 2015: Explaining Advanced Virtual Volumes Configurations
VMworld 2015: Explaining Advanced Virtual Volumes ConfigurationsVMworld
 

Plus de VMworld (20)

VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
 
VMworld 2016: Troubleshooting 101 for Horizon
VMworld 2016: Troubleshooting 101 for HorizonVMworld 2016: Troubleshooting 101 for Horizon
VMworld 2016: Troubleshooting 101 for Horizon
 
VMworld 2016: Advanced Network Services with NSX
VMworld 2016: Advanced Network Services with NSXVMworld 2016: Advanced Network Services with NSX
VMworld 2016: Advanced Network Services with NSX
 
VMworld 2016: How to Deploy VMware NSX with Cisco Infrastructure
VMworld 2016: How to Deploy VMware NSX with Cisco InfrastructureVMworld 2016: How to Deploy VMware NSX with Cisco Infrastructure
VMworld 2016: How to Deploy VMware NSX with Cisco Infrastructure
 
VMworld 2016: Enforcing a vSphere Cluster Design with PowerCLI Automation
VMworld 2016: Enforcing a vSphere Cluster Design with PowerCLI AutomationVMworld 2016: Enforcing a vSphere Cluster Design with PowerCLI Automation
VMworld 2016: Enforcing a vSphere Cluster Design with PowerCLI Automation
 
VMworld 2016: What's New with Horizon 7
VMworld 2016: What's New with Horizon 7VMworld 2016: What's New with Horizon 7
VMworld 2016: What's New with Horizon 7
 
VMworld 2016: Virtual Volumes Technical Deep Dive
VMworld 2016: Virtual Volumes Technical Deep DiveVMworld 2016: Virtual Volumes Technical Deep Dive
VMworld 2016: Virtual Volumes Technical Deep Dive
 
VMworld 2016: Advances in Remote Display Protocol Technology with VMware Blas...
VMworld 2016: Advances in Remote Display Protocol Technology with VMware Blas...VMworld 2016: Advances in Remote Display Protocol Technology with VMware Blas...
VMworld 2016: Advances in Remote Display Protocol Technology with VMware Blas...
 
VMworld 2016: The KISS of vRealize Operations!
VMworld 2016: The KISS of vRealize Operations! VMworld 2016: The KISS of vRealize Operations!
VMworld 2016: The KISS of vRealize Operations!
 
VMworld 2016: Getting Started with PowerShell and PowerCLI for Your VMware En...
VMworld 2016: Getting Started with PowerShell and PowerCLI for Your VMware En...VMworld 2016: Getting Started with PowerShell and PowerCLI for Your VMware En...
VMworld 2016: Getting Started with PowerShell and PowerCLI for Your VMware En...
 
VMworld 2016: Ask the vCenter Server Exerts Panel
VMworld 2016: Ask the vCenter Server Exerts PanelVMworld 2016: Ask the vCenter Server Exerts Panel
VMworld 2016: Ask the vCenter Server Exerts Panel
 
VMworld 2016: Virtualize Active Directory, the Right Way!
VMworld 2016: Virtualize Active Directory, the Right Way! VMworld 2016: Virtualize Active Directory, the Right Way!
VMworld 2016: Virtualize Active Directory, the Right Way!
 
VMworld 2016: Migrating from a hardware based firewall to NSX to improve perf...
VMworld 2016: Migrating from a hardware based firewall to NSX to improve perf...VMworld 2016: Migrating from a hardware based firewall to NSX to improve perf...
VMworld 2016: Migrating from a hardware based firewall to NSX to improve perf...
 
VMworld 2015: Troubleshooting for vSphere 6
VMworld 2015: Troubleshooting for vSphere 6VMworld 2015: Troubleshooting for vSphere 6
VMworld 2015: Troubleshooting for vSphere 6
 
VMworld 2015: Monitoring and Managing Applications with vRealize Operations 6...
VMworld 2015: Monitoring and Managing Applications with vRealize Operations 6...VMworld 2015: Monitoring and Managing Applications with vRealize Operations 6...
VMworld 2015: Monitoring and Managing Applications with vRealize Operations 6...
 
VMworld 2015: Advanced SQL Server on vSphere
VMworld 2015: Advanced SQL Server on vSphereVMworld 2015: Advanced SQL Server on vSphere
VMworld 2015: Advanced SQL Server on vSphere
 
VMworld 2015: Virtualize Active Directory, the Right Way!
VMworld 2015: Virtualize Active Directory, the Right Way!VMworld 2015: Virtualize Active Directory, the Right Way!
VMworld 2015: Virtualize Active Directory, the Right Way!
 
VMworld 2015: Site Recovery Manager and Policy Based DR Deep Dive with Engine...
VMworld 2015: Site Recovery Manager and Policy Based DR Deep Dive with Engine...VMworld 2015: Site Recovery Manager and Policy Based DR Deep Dive with Engine...
VMworld 2015: Site Recovery Manager and Policy Based DR Deep Dive with Engine...
 
VMworld 2015: Building a Business Case for Virtual SAN
VMworld 2015: Building a Business Case for Virtual SANVMworld 2015: Building a Business Case for Virtual SAN
VMworld 2015: Building a Business Case for Virtual SAN
 
VMworld 2015: Explaining Advanced Virtual Volumes Configurations
VMworld 2015: Explaining Advanced Virtual Volumes ConfigurationsVMworld 2015: Explaining Advanced Virtual Volumes Configurations
VMworld 2015: Explaining Advanced Virtual Volumes Configurations
 

Dernier

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Dernier (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Management, and Virtualization Extensions

  • 1. Big Data Platform Building Blocks: Serengeti, Resource Management, and Virtualization Extensions Abhishek Kashyap, Pivotal Kevin Leong, VMware VAPP5762 #VAPP5762
  • 2. 22 Agenda  Big Data, Hadoop, and What It Means to You  The VMware Big Data Platform • Operate Clusters Simply • Share Infrastructure Efficiently • Leverage Existing Investment  Pivotal and VMware: Partnering to Virtualize Hadoop  Conclusion and Q&A
  • 3. 33 Big Data, Hadoop, and What It Means to You
  • 4. 44 What is Hadoop?  Framework that allows for distributed processing of large data sets across clusters of commodity servers • Store large amount of data • Process the large amount of data stored  Inspired by Google’s MapReduce and Google File System (GFS) papers  Apache Open Source Project • Initial work done at Yahoo! starting in 2005 • Open sourced in 2009 there is now a very active open source community
  • 5. 55 What is Hadoop?  Storage & Compute in One Framework  Open Source Project of the Apache Software Foundation  Java-intensive programming required HDFS MapReduce Two Core Components Scalable storage in Hadoop Distributed File System Compute via the MapReduce distributed processing platform
  • 6. 66 Why Hadoop?  HDFS provides cheap and reliable storage on commodity hardware  In-place data analysis, rather than moving from file systems to data warehouses  Ability to analyze structured and unstructured data Enables better business decisions from more types of data at higher speeds and lower costs
  • 7. 77 Use Case: Data Warehouse Augmentation / Offload  Challenges • Existing EDW used for low value and resource consuming ETL process • Planned growth will far exceed compute capacity • Hard to do analytics or even basic reporting on EDW system  Objectives • Reduce EDW Total Cost of Ownership • Enable longer data retention to enable analytics and accelerate time to market • Migrate ETL off EDW to free up compute resources
  • 8. 88 Use Case: Retailer Trend Analysis  Deep Historical Reporting for Retail Trends: • Credit card company loads 10 years of data for all retailers (100’s of TB’s) • Run Map/Reduce Job develop historical picture of retailers in a specific area • Load results from Hadoop into data warehouse and further analyze with standard BI/statistics packages  Why do this in Hadoop? • Ability to store years of data cost effectively • Data available for immediate recall (not on tapes or flat files) • No need to ETL/normalize the data • Data exists in its valuable, original format • Offload intensive computation from DW • Ability to combine structured and unstructured data
  • 9. 99 Pivotal HD HDFS HBase Pig, Hive, Mahout Map Reduce Sqoop Flume Resource Management & Workflow Yarn Zookeeper Apache
  • 10. 1010 Pivotal HD HDFS HBase Pig, Hive, Mahout Map Reduce Sqoop Flume Resource Management & Workflow Yarn Zookeeper Deploy, Configure, Monitor, Manage Command Center Hadoop Virtualization (HVE) Data Loader Pivotal HD Enterprise Apache Pivotal HD Enterprise
  • 11. 1111 Pivotal HD HDFS HBase Pig, Hive, Mahout Map Reduce Sqoop Flume Resource Management & Workflow Yarn Zookeeper Deploy, Configure, Monitor, Manage Command Center Hadoop Virtualization (HVE) Data Loader Pivotal HD Enterprise Apache Pivotal HD Enterprise HAWQ Xtension Framework Catalog Services Query Optimizer Dynamic Pipelining ANSI SQL + Analytics HAWQ– Advanced Database Services
  • 12. 1212 Pivotal HD HDFS HBase Pig, Hive, Mahout Map Reduce Sqoop Flume Resource Management & Workflow Yarn Zookeeper Deploy, Configure, Monitor, Manage Command Center Data Loader Pivotal HD Enterprise Apache Pivotal HD Enterprise HAWQ Xtension Framework Catalog Services Query Optimizer Dynamic Pipelining ANSI SQL + Analytics HAWQ– Advanced Database Services Spring XD Pivotal Analytics Pivotal Chorus & Alpine Miner MoreVRP Hadoop Virtualization (HVE)
  • 13. 1313 The VMware Big Data Platform
  • 14. 1414 The Big Data Journey in the Enterprise Stage 3: Cloud Analytics Platform • Serve many departments • Often part of mission critical workflow • Fully integrated with analytics/BI tools Stage1: Hadoop Piloting • Often start with line of business • Try 1 or 2 use cases to explore the value of Hadoop Stage 2: Hadoop Production • Serve a few departments • More use cases • Growing # and size of clusters • Core Hadoop + components 10’s 100’s0 node Integrated Scale Standalone
  • 15. 1515 Getting from Here to There Host Host Host Host Host Host Host Virtualization Shared File SystemData Layer Compute Layer Hadoop test/dev
  • 16. 1616 Getting from Here to There Host Host Host Host Host Host Host Virtualization Shared File SystemData Layer Compute Layer Hadoop test/dev Hadoop production Hadoop production Hadoop experimentation
  • 17. 1717 Getting from Here to There Host Host Host Host Host Host Host Virtualization Shared File SystemData Layer Compute Layer Hadoop test/dev HBase Hadoop production SQL on Hadoop HAWQ, Impala, Drill NoSQL Cassandra Mongo Other Spark Shark Solr Platfora
  • 18. 1818 Benefits of Virtualization at Each Stage Stage 3: Cloud Analytics Platform  Mixed workloads  Right tool at the right time  Flexible and elastic infrastructure Stage1: Hadoop Piloting  Rapid deployment  On the fly cluster resizing  Flexible config  Automation of cluster lifecycle Stage 2: Hadoop Production  High Availability  Consolidation  Tiered SLAs  Elastic Scaling 10’s 100’s0 node Integrated Scale Standalone
  • 19. 1919 A Brief History of Project Serengeti (and Big Data at VMware)
  • 20. 2020 Big Data Initiatives at VMware Serengeti vSphere Resource Management Hadoop Virtualization Extensions  Virtualization changes for core Hadoop  Contributed back to Apache Hadoop  Advanced resource management on vSphere  Big Data applications-specific extension to DRS  Open source project  Tool to simplify virtualized Hadoop deployment & operations
  • 21. 2121 Clustered Workload Management: The Next Frontier ESXi Serengeti Hadoop Management Virtualization vCenter Source: http://www.conferencebike.com/image/generated/792.png
  • 22. 2222 Serengeti Project History Serengeti 0.5 June 2012 Serengeti 0.6 August 2012 Serengeti 0.7 October 2012 Serengeti 0.8 April 2013 • Hadoop in 10 min • Highly Available Hadoop • Time to insight • Configuring Hadoop • Compute elasticity • Configuring placement and topology • HBase • MapR • CDH4 • Performance best practices Serengeti 0.9/ BDE Beta June 2013 • Integrated GUI • Automatic elasticity • YARN/ Pivotal HD
  • 23. 2323 Big Data Extensions: Serengeti-vCenter Integration ESXi Hadoop Management Virtualization Big Data Extensions + vCenter
  • 26. 2626 What Does Nick Think About Hadoop? I don’t want to be the bottleneck when it comes to provisioning Hadoop clusters I need sizing flexibility, because my Hadoop users don’t know how large of a cluster they need I want to establish a repeatable process for deploying Hadoop clusters I don’t really know that much about Hadoop I want to better manage the jumble of LOB Hadoop clusters in my enterprise Source: http://www.smartdraw.com/solutions/information-technology/images/nick.png
  • 27. 2727 Choose Your Own Adventure Source: http://www.vintagecomputing.com/wp-content/images/retroscan/supercomputer_cyoa_large.jpg
  • 29. 2929 Deploy Hadoop Clusters in Minutes Hadoop Installation and Configuration Network Configuration OS installation Server preparation From manual process To fully automated, using the GUI
  • 30. 3030 How It Works  BDE is packaged as a virtual appliance, which can be easily deployed on vCenter  BDE works as a vCenter extension and establishes SSL connection with vCenter  BDE clones VMs from the template and controls/configures VMs through vCenter Host Host Host Host Host Virtualization Platform Hadoop Node Hadoop Node vCenter Management Server Template Virtual Appliance VM Cloning
  • 31. 3131 User-specified Customizations Using Cluster Specification File Storage configuration Choice of shared or local High Availability option Number of nodes and resource configuration VM placement policies
  • 33. 3333 Agility and Operational Simplicity Host Host Host Host Host Host Host Virtualization Shared File SystemData Layer Compute Layer Hadoop test/dev
  • 36. 3636 Production Test Experimentation Dept A: recommendation engine Dept B: ad targeting Production Test Experimentation Log files Social dataTransaction data Historical cust behavior Pain Points: 1. Cluster sprawl 2. Redundant common data in separate clusters 3. Inefficient use of resources. Some clusters could be running at capacity while other clusters are sitting idle NoSQL Real time SQL … On the horizon… Challenges of Running Hadoop in the Enterprise
  • 37. 3737 The Virtualization Advantage Experimentation Production Recommendation Engine Production Ad Targeting Test/Dev Production Test Production Test Experimentation Recommendation engine Ad targeting Experimentation One physical platform to support multiple virtual big data clusters
  • 38. 3838 What Other Things Does Nick Think About Hadoop? Source: http://www.smartdraw.com/solutions/information-technology/images/nick.png I want to scale out when my workload requires it My Hadoop users ask for large Hadoop clusters, which end up underutilized I want to offer Hadoop-as-a- Service in my private cloud I want to get all Hadoop clusters into a centralized environment to minimize spend
  • 39. 3939 Achieving Multi-tenancy  Resource Isolation • Control the greedy noisy neighbor • Reserve resources to meet needs  Version Isolation • Allow concurrent OS, App, Distro versions  Security Isolation • Provide privacy between users/groups • Runtime and data privacy required Host Host Host Host Host Host VMware vSphere + Serengeti Host
  • 40. 4040 Combined Storage/ Compute VM Hadoop in VM  VM lifecycle determined by Datanode  Limited elasticity  Limited to Hadoop Multi-Tenancy Storage Compute VM VM Separate Storage  Separate compute from data  Elastic compute  Enable shared workloads  Raise utilization Storage T1 T2 VM VM VM Separate Compute Tenants  Separate virtual clusters per tenant  Stronger VM-grade security and resource isolation  Enable deployment of multiple Hadoop runtime versions Slave Node Separating Hadoop Data and Compute for Elasticity
  • 41. 4141 Dynamic Hadoop Scaling  Deploy separate compute clusters for different tenants sharing HDFS  Commission/decommission compute nodes according to priority and available resources ExperimentationDynamic resourcepool Data layer Production recommendation engine Compute layer Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Experimentation Production Compute VM Job Tracker Job Tracker VMware vSphere + Serengeti
  • 43. 4343 State, stats (Slots used, Pending work) Commands (Decommission, Recommission) Stats and VM configuration Serengeti Job Tracker vCenter DB Manual/Auto Power on/off Virtual Hadoop Manager (VHM) Job Tracker Task Tracker Task Tracker Task Tracker vCenter Server Serengeti Configuration VC state and stats Hadoop state and stats VC actions Hadoop actions Algorithms Cluster Configuration Resource Management Module
  • 44. 4444 Combining Elasticity and Multi-tenancy Host Host Host Host Host Host Host Virtualization Shared File SystemData Layer Compute Layer Hadoop test/dev Hadoop production Hadoop production Hadoop experimentation
  • 46. 4646 What Is Nick Still Thinking About Hadoop? Source: http://www.smartdraw.com/solutions/information-technology/images/nick.png I want to use my existing infrastructure, not buy new hardware I want to leverage the tools I already have Hadoop on Amazon is costing too much My data is in shared storage; do I have to move it? I want a low-risk way of trying Hadoop
  • 47. 4747 Use Storage That Meets Your Needs SAN Storage $2 - $10/Gigabyte $1M gets: 0.5 Petabytes 200,000 IOPS 8Gbyte/sec NAS Filers $1 - $5/Gigabyte $1M gets: 1 Petabyte 200,000 IOPS 10Gbyte/sec Local Storage $0.05/Gigabyte $1M gets: 10 Petabytes 400,000 IOPS 250 Gbytes/sec
  • 48. 4848 Leveraging Isilon as External HDFS  Time to results: Analysis of data in place  Lower risk using vSphere with Isilon  Scale storage and compute independently Data Layer – Hadoop on Isilon Elastic Virtual Compute Layer
  • 49. 4949 Hybrid Storage Model to Get the Best of Both Worlds  Master nodes: • NameNode, JobTracker on shared storage • Leverage vSphere vMotion, HA and FT  Slave nodes • TaskTracker, DataNode on local storage • Lower cost, scalable bandwidth Local StorageShared Storage
  • 50. 5050 Achieving HA for the Entire Hadoop Stack  Battle-tested HA technology  Single mechanism to achieve HA for the entire Hadoop stack  Simple to enable HA/FT HDFS (Hadoop Distributed File System) HBase (Key-Value store) MapReduce (Job Scheduling/Execution System) Pig (Data Flow) Hive (SQL) BI ReportingETL Tools ManagementServer Zookeepr(Coordination) HCatalog RDBMS Namenode Jobtracker Hive MetaDB Hcatalog MDB Server
  • 51. 5151 Leveraging Other VMware Assets  Monitoring with vCenter Operations Manager • Gain comprehensive visibility • Eliminate manual processes with intelligent automation • Proactively manage operations  Future: vCloud Automation Center, Software-defined Storage
  • 52. 5252 Get Maximum Value from Existing Tools and Infrastructure Host Host Host Host Host Host Host Virtualization Shared File SystemData Layer Compute Layer Hadoop test/dev HBase Hadoop production SQL on Hadoop HAWQ, Impala, Drill NoSQL Cassandra Mongo Other Spark Shark Solr Platfora
  • 53. 5353 Pivotal and VMware: Partnering to Virtualize Hadoop
  • 54. 5454 Virtualization Benefits  Multi-tenancy (users, business units) with strong vSphere-based isolation  Multiple big data applications and compute engines can access common HDFS data  Agility to scale Hadoop nodes at run-time  Provide On-Demand Hadoop / Hadoop as a Service
  • 55. 5555 Busting Myths About Virtual Hadoop Virtualization will add significant performance overhead Virtual Hadoop performance is comparable to bare metal Hadoop cannot work with shared storage Shared storage is a valid choice, especially for smaller clusters Virtualization necessitates the use of shared storage Shared storage is useful for HA, but virtual Hadoop on DAS is very common Hadoop distribution vendors don’t support virtual implementations Pivotal HD is jointly tested, certified, and supported on vSphere Source: http://www.psychologytoday.com/files/u637/good-grief-charlie-brown.jpg, http://images2.wikia.nocookie.net/__cb20101130042247/peanuts/images/6/6d/Joe-cool-1-.jpg
  • 56. 5656 Native versus Virtual Platforms, 32 hosts, 16 disks/host Source: http://www.vmware.com/resources/techresources/10360
  • 57. 5757 Harness the Flexibility of Virtualization Hadoop Virtualization Extensions
  • 58. 5858 You Need Hadoop Virtual Extensions  Topology Extensions: • Enable Hadoop to recognize additional virtualization layer for read/write/balancing for proper replica placement • Enable compute/data node separation without losing locality  Elasticity Extensions: • Ability to dynamically adjust resources allocated (CPU, memory, map/reduce slots) to compute nodes • Enables runtime elasticity of Hadoop nodes
  • 59. 5959 Hadoop Virtual Extensions  Topology Extensions: • Enable Hadoop to recognize additional virtualization layer for read/write/balancing for proper replica placement • Enable compute/data node separation without losing locality  Elasticity Extensions: • Ability to dynamically adjust resources allocated (CPU, memory, map/reduce slots) to compute nodes • Enables runtime elasticity of Hadoop nodes
  • 60. 6060 Current Hadoop Network Topology Not Virtualization Aware H1 H2 H3 R1 H4 H5 H6 R2 H7 H8 H9 R3 H10 H11 H12 R4 D1 D1 / • D = data center • R = rack • H = host  Multiple replicas may end up on same Physical Host in Virtual environments
  • 61. 6161 HVE Adds a New Layer in Hadoop Network Topology • D = data center • R = rack • NG = node group • HG = node N13N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 R1 R2 R3 R4 D1 D2 / NG1 NG2 NG3 NG4 NG5 NG6 NG7 NG8
  • 62. 6262 “Virtualization Aware” Replica Placement Policy During Write Updated Policies: • No replicas are placed on the same node or nodes under the same node group • 1st replica is on the local node or one of nodes under the same node group of the writer • 2nd replica is on a remote rack of the 1st replica • 3rd replica is on the same rack as the 2nd replica • Remaining replicas are placed randomly across rack to meet minimum restriction
  • 63. 6363 Hadoop Virtual Extensions  Topology Extensions: • Enable Hadoop to recognize additional virtualization layer for read/write/balancing for proper replica placement • Enable compute/data node separation without losing locality  Elasticity Extensions: • Ability to dynamically adjust resources allocated (CPU, memory, map/reduce slots) to compute nodes • Enables runtime elasticity of Hadoop nodes
  • 64. 6464 HVE Achieves Vertical Scaling of Hadoop Nodes  VM’s boundary is elastic already • VM resource type: reserved (low limit) and maximum (up limit) • If resource is tight, VMs compete for resource (between reserved and maximum) based on shares • “Stealing” resources without notifying Apps sometimes cause very bad performance • Thus, need to figure out a way to make app-aware resource change  Current Hadoop resource schedulers are static • MRV1 – slots • YARN – resources (Memory for now, YARN-2 will include CPUs)  HVE Elasticity patches • Enable flexible resource model for each Hadoop node • Change resources at runtime
  • 65. 6565 Pivotal HD is the Best Suited for Virtualization  Only distribution that ships with VMware Hadoop Virtual Extensions (HVE) • Fully tested • Ensures proper HDFS replication placement on vSphere • Improves MapReduce performance through better data locality on vSphere • Allows dynamic scaling of Hadoop Compute Nodes  Certified on vSphere  VMware Serengeti deploys and scales Pivotal HD on vSphere out- of-box • Only YARN based distribution supported by Serengeti
  • 67. 6767 Big Data Platform Building Blocks and Key Benefits Serengeti vSphere Resource Management Hadoop Virtualization Extensions Partnership
  • 70.
  • 71. Big Data Platform Building Blocks: Serengeti, Resource Management, and Virtualization Extensions Abhishek Kashyap, Pivotal Kevin Leong, VMware VAPP5762 #VAPP5762