SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
© 2009 VMware Inc. All rights reserved
Architecting Virtualized Infrastructure for Big Data
Richard McDougall
@richardmcdougll
CTO, Application Infrastructure, Big Data Lead, VMware, Inc
2
Cloud: Big Shifts in Simplification and Optimization
2. Dramatically Lower
Costs
to redirect investment into
value-add opportunities
3. Enable Flexible, Agile
IT Service Delivery
to meet and anticipate the
needs of the business
1. Reduce the Complexity
to simplify operations
and maintenance
3
Infrastructure, Apps and now Data…
Private
Public
Build Run
Manage
Simplify Infrastructure
With Cloud
Simplify App Platform
Through PaaS
Simplify Data
4
Trend 1/3: New Data Growing at 60% Y/Y
Source: The Information Explosion, 2009
medical(imaging,(
sensors(
cad/cam,(appliances,(machine(data,(digital(movies(
digital(photos(
digital(tv(
audio(
camera(phones,(rfid(
satellite(images,(logs,(scanners,(twi7er(
Exabytes of information stored 20 Zetta by 2015
1 Yotta by 2030
Yes, you are part
of the yotta
generation…
5
Data Growth in the Enterprise
6
Trend 2/3: Big Data – Driven by Real-World Benefit
7
Trend 3/3: Value from Data Exceeds Hardware Cost
!  Value from the intelligence of data analytics now outstrips the cost
of hardware
•  Hadoop enables the use of 10x lower cost hardware
•  Hardware cost halving every 18mo
Big Iron:
$40k/CPU
Commodity
Cluster:
$1k/CPU
Value
Cost
8
A Holistic View of a Big Data System:
ETL
Real Time
Streams
Unstructured Data (HDFS)
Real Time
Structured
Database
(hBase,
Gemfire,
Cassandra)
Big SQL
(Greenplum,
AsterData,
Etc…)
Batch
Processin
g
Real-Time
Processing
(s4, storm)
Analytics
9
Big Data Frameworks and Characteristics
Framework Scale of
data
Scale of
Cluster
Computable
Data?
Local
Disks?
File System:
Gluster, Isilon, etc,…
10s PB 100s Some Yes, for cost
Map-reduce:
Hadoop
100s PB 1,000s Yes Yes, for cost,
bandwidth
and
availability
Big-SQL:
Greenplum, Aster Data,
Netezza, …
PB’s 100s Some Yes, for cost
and
bandwidth
No-SQL:
Cassandra, hBase, …
Trilions
Of rows
100s Some Yes, for cost
and
availability
In-Memory:
Redis, Gemfire,
Membase, …
Billions of
rows
10s-100s Yes Primarily
Memory
10
Cloud Infrastructure
Data Platform
Private
Public
Developer
Frameworks
The Unified Analytics Cloud Platform
Analytics Tools
vSphere
Database/DataStore
Cassandra
Greenplum
hBase
Voldemort
HDFS
Data PaaS
PaaS
Hadoop
Python
Madlib
Cloudfoundry
Data Meer
Karmasphere
Spring
Data-Director
EMC Chorus
Tableau
11
Unifying the Big Data Platform using Virtualization
!  Goals
•  Make it fast and easy to provision new data Clusters on Demand
•  Allow Mixing of Workloads
•  Leverage virtual machines to provide isolation (esp. for Multi-tenant)
•  Optimize data performance based on virtual topologies
•  Make the system reliable based on virtual topologies
!  Leveraging Virtualization
•  Elastic scale
•  Use high-availability to protect key services, e.g., Hadoop’s namenode/job tracker
•  Resource controls and sharing: re-use underutilized memory, cpu
•  Prioritize Workloads: limit or guarantee resource usage in a mixed environment
Cloud Infrastructure
Private
Public
12
SQLCluster
Unifed Analytics Infrastructure
Hadoop Cluster
Private
Public
Big SQL
A Unified Analytics Cloud Significantly Simplifies
HadoopNoSQL
Decision Support Cluster
NoSQL Cluster
!  Simplify
• Single Hardware Infrastructure
• Faster/Easier provisioning
!  Optimize
• Shared Resources = higher utilization
• Elastic resources = faster on-demand
access
13
Use Local Disk where it’s Needed
SAN Storage
$2 - $10/Gigabyte
$1M gets:
0.5Petabytes
200,000 IOPS
1Gbyte/sec
NAS Filers
$1 - $5/Gigabyte
$1M gets:
1 Petabyte
400,000 IOPS
2Gbyte/sec
Local Storage
$0.05/Gigabyte
$1M gets:
20 Petabytes
10,000,000 IOPS
800 Gbytes/sec
14
VMware is Commited to be the Best Virtual platform for
Hadoop
!  Performance Studies and Best Practices
•  Studies through 2010-2011 of Hadoop 0.20 on vSphere 5
•  White paper, including detailed configurations and recommendations
!  Making Hadoop run well on vSphere
•  Performance optimizations in vSphere releases
•  VMware engagement in Hadoop Community effort
•  Supporting key partners with their distibutions on vSphere
•  Contributing enhancements to Hadoop
!  Hadoop Framework Integration
•  Spring Hadoop: Enabling Spring to simplify Map-Reduce Jobs
•  Spring Batch: Sophisticated batch management (Oozie on steroids)
15
Extend Virtual Storage Architecture to Include Local Disk
!  Shared Storage: SAN or NAS
•  Easy to provision
•  Automated cluster rebalancing
!  Hybrid Storage
•  SAN for boot images, VMs, other
workloads
•  Local disk for Hadoop & HDFS
•  Scalable Bandwidth, Lower Cost/GB
Host
Hadoop
OtherVM
OtherVM
Host
Hadoop
Hadoop
OtherVM
Host
Hadoop
Hadoop
OtherVM Host
Hadoop
OtherVM
OtherVM
Host
Hadoop
Hadoop
OtherVM
Host
Hadoop
Hadoop
OtherVM
16
Performance Analysis of Big Data (Hadoop) on Virtualization
0
0.2
0.4
0.6
0.8
1
1.2
RatiotoNative
1 VM
2 VMs
Ratio of time taken – Lower is Better
Tested on vSphere 5.0
17
Simplify Hetrogeneous Data Management via Data PaaS
Cloud Infrastructure
Data Platform
Developer
Analytics Tools
Databases
File-
system
Big
SQL
Large-
Scale
NoSQL
In-
Memor
y
Data PaaS – Common Data Management Layer
Provisioning
Management
Multi-tenancy
Data Discovery
Import/Export
Cloud Infrastructure
18
vFabric Data Director
vFabric Data Director Powers Database-as-a-Service
VMware vSphere
Provisioning
Backup/
Restore
Clone
One click
HA
Resource
Mgmt
Security
Mgmt
Database
Templates
Monitor
DBA App Dev
IT Admin
Automation
Self-Service
Policy Based
Control
DBA
Existing Applications New Applications
19
Data Systems: Databases, file systems
Cloud Infrastructure
Data Platform
Developer
Analytics Tools
Databases
File-
system
Big
SQL
Large-
Scale
NoSQL
In-
Memor
y
Unstructured Structured
20
Technology: Databases and Data Stores for Big Data
File-
system
Big
SQL
Large-
Scale
NoSQL
In-
Memory
Unstructured Structured
Types of
Data
Log files,
machine
generated data,
documents,
device data,
etc…
Loosely typed device
data, records, events,
statistics, complex
relations/graphs
Structured,
partitionable data
Structured data
Techno-
logies
NAS, HDFS,
Blob (S3, Atmos,
etc..)
Cassandra, hBase,
Voldemort
Gemfire, Redis,
Membase
Greenplum, Sybase
IQ, Aster Data, etc,.
Values
Store any data,
easy to scale-out,
can optimize for
cost
Easy to scale-out,
flexible and dynamic
schema’s
High Throughput, low
latency
High performance
for repetitive
queries. Ease of
query language.
21
Simplified Developer Experience through PaaS
Cloud Infrastructure
Data Platform
Developer
Analytics Tools
Databases
Platform as a Service
22
Spring Big Data Integrations
!  NoSQL Integration
•  Spring data for MongoDB, Gemfire, Riak, Neo4j, Blob, Cassandra
!  Spring Hadoop
•  Announced this week at Strata!
•  Provides support for developing applications based on Hadoop technologies by
leveraging the capabilities of the Spring ecosystem.
!  Spring Batch
•  Integration allows Hadoop jobs and HDFS operations as part of workflow
23
Cloud Infrastructure
Data Platform
Private
Public
Developer
Frameworks
The Unified Analytics Cloud Platform
Analytics Tools
vSphere
Database/DataStore
Cassandra
Greenplum
hBase
Voldemort
HDFS
Data PaaS
PaaS
Hadoop
Python
Madlib
Cloudfoundry
Data Meer
Karmasphere
Spring
Data-Director
EMC Chorus
Tableau
24
Summary
!  Revolution in Big Data is under way
•  Data centric applications are now critical
!  Hadoop on Virtualization
•  Proven performance
•  Cloud/Virtualization values apparent for Hadoop use
!  Simplify through a Unified Analytics Cloud
•  One Platform for today’s and future big-data systems
•  Better Utilization
•  Faster deployment, elastic resources
•  Secure, Isolated, Multi-tenant capability for Analytics
25
References
!  Twitter
•  @richardmcdougll
!  My CTO Blog
•  http://communities.vmware.com/community/vmtn/cto/cloud
!  Hadoop on vSphere
•  Talk @ Hadoop World
•  Performance Paper – http://www.vmware.com/files/.../VMW-Hadoop-Performance-vSphere5.pdf
!  Spring Hadoop
•  http://blog.springsource.org/2012/02/29/introducing-spring-hadoop

Contenu connexe

Tendances

Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
Cloudera, Inc.
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
datasalt
 

Tendances (20)

C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
BlueData DataSheet
BlueData DataSheetBlueData DataSheet
BlueData DataSheet
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Rescuing the Honey Bee with Kinetica, NVIDIA, and Microsoft
Rescuing the Honey Bee with Kinetica, NVIDIA, and MicrosoftRescuing the Honey Bee with Kinetica, NVIDIA, and Microsoft
Rescuing the Honey Bee with Kinetica, NVIDIA, and Microsoft
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
The Pandemic Changes Everything, the Need for Speed and Resiliency
The Pandemic Changes Everything, the Need for Speed and ResiliencyThe Pandemic Changes Everything, the Need for Speed and Resiliency
The Pandemic Changes Everything, the Need for Speed and Resiliency
 

En vedette

Daryn Gibson Bentley Transcript
Daryn Gibson Bentley TranscriptDaryn Gibson Bentley Transcript
Daryn Gibson Bentley Transcript
Daryn Gibson
 
Top 8 consultant dietitian resume samples
Top 8 consultant dietitian resume samplesTop 8 consultant dietitian resume samples
Top 8 consultant dietitian resume samples
LadyGaGa789
 
Karen, 2013-2014 Student Teaching Report
Karen, 2013-2014 Student Teaching ReportKaren, 2013-2014 Student Teaching Report
Karen, 2013-2014 Student Teaching Report
Karen Liu
 
Musical Jeopardy Throwdown
Musical Jeopardy ThrowdownMusical Jeopardy Throwdown
Musical Jeopardy Throwdown
LibDani
 
BPS Business Solutions Profile
BPS Business Solutions ProfileBPS Business Solutions Profile
BPS Business Solutions Profile
Ramaraja Sekhar Y
 
Top 8 conference facilitator resume samples
Top 8 conference facilitator resume samplesTop 8 conference facilitator resume samples
Top 8 conference facilitator resume samples
LadyGaGa789
 
LIN_delivering change_Introduction
LIN_delivering change_IntroductionLIN_delivering change_Introduction
LIN_delivering change_Introduction
Karan Mangat
 
Collection Acts and Regulations - Cross-Canada
Collection Acts and Regulations - Cross-CanadaCollection Acts and Regulations - Cross-Canada
Collection Acts and Regulations - Cross-Canada
François Sauvageau
 

En vedette (18)

Daryn Gibson Bentley Transcript
Daryn Gibson Bentley TranscriptDaryn Gibson Bentley Transcript
Daryn Gibson Bentley Transcript
 
AL RAFED EXPERIENCE
AL RAFED EXPERIENCEAL RAFED EXPERIENCE
AL RAFED EXPERIENCE
 
Top 8 consultant dietitian resume samples
Top 8 consultant dietitian resume samplesTop 8 consultant dietitian resume samples
Top 8 consultant dietitian resume samples
 
Karen, 2013-2014 Student Teaching Report
Karen, 2013-2014 Student Teaching ReportKaren, 2013-2014 Student Teaching Report
Karen, 2013-2014 Student Teaching Report
 
RHBC 166: Faithful in the Midst of a Spiritual War
RHBC 166: Faithful in the Midst of a Spiritual WarRHBC 166: Faithful in the Midst of a Spiritual War
RHBC 166: Faithful in the Midst of a Spiritual War
 
Musical Jeopardy Throwdown
Musical Jeopardy ThrowdownMusical Jeopardy Throwdown
Musical Jeopardy Throwdown
 
BPS Business Solutions Profile
BPS Business Solutions ProfileBPS Business Solutions Profile
BPS Business Solutions Profile
 
Could grexit be just around the corner the european union is on the verge of ...
Could grexit be just around the corner the european union is on the verge of ...Could grexit be just around the corner the european union is on the verge of ...
Could grexit be just around the corner the european union is on the verge of ...
 
Sampah dan kesehatan masyarakat
Sampah dan kesehatan masyarakatSampah dan kesehatan masyarakat
Sampah dan kesehatan masyarakat
 
Top 8 conference facilitator resume samples
Top 8 conference facilitator resume samplesTop 8 conference facilitator resume samples
Top 8 conference facilitator resume samples
 
Stellar catalogue 2015
Stellar catalogue 2015Stellar catalogue 2015
Stellar catalogue 2015
 
LIN_delivering change_Introduction
LIN_delivering change_IntroductionLIN_delivering change_Introduction
LIN_delivering change_Introduction
 
Collection Acts and Regulations - Cross-Canada
Collection Acts and Regulations - Cross-CanadaCollection Acts and Regulations - Cross-Canada
Collection Acts and Regulations - Cross-Canada
 
Oop ppt
Oop pptOop ppt
Oop ppt
 
Secrets
SecretsSecrets
Secrets
 
Test
TestTest
Test
 
Ben CV new
Ben CV newBen CV new
Ben CV new
 
mi primer slideshare
mi primer slidesharemi primer slideshare
mi primer slideshare
 

Similaire à Presentation architecting virtualized infrastructure for big data

Architecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationArchitecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentation
Vlad Ponomarev
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
Richard McDougall
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 

Similaire à Presentation architecting virtualized infrastructure for big data (20)

Architecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationArchitecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentation
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-final
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
WTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The FundamentalsWTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The Fundamentals
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
MT129 Isilon Data Lake Overview
MT129 Isilon Data Lake OverviewMT129 Isilon Data Lake Overview
MT129 Isilon Data Lake Overview
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
Scale-on-Scale : Part 1 of 3 - Production Environment
Scale-on-Scale : Part 1 of 3 - Production EnvironmentScale-on-Scale : Part 1 of 3 - Production Environment
Scale-on-Scale : Part 1 of 3 - Production Environment
 

Plus de solarisyourep

Presentation a new era in it
Presentation   a new era in itPresentation   a new era in it
Presentation a new era in it
solarisyourep
 
Presentation a vision for user centric computing
Presentation   a vision for user centric computingPresentation   a vision for user centric computing
Presentation a vision for user centric computing
solarisyourep
 
Presentation advanced management – the road ahead
Presentation   advanced management – the road aheadPresentation   advanced management – the road ahead
Presentation advanced management – the road ahead
solarisyourep
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
solarisyourep
 
Presentation avoiding the 19 biggest ha & drs configuration mistakes
Presentation   avoiding the 19 biggest ha & drs configuration mistakesPresentation   avoiding the 19 biggest ha & drs configuration mistakes
Presentation avoiding the 19 biggest ha & drs configuration mistakes
solarisyourep
 
Presentation blade center foundation for cloud
Presentation   blade center foundation for cloudPresentation   blade center foundation for cloud
Presentation blade center foundation for cloud
solarisyourep
 
Presentation building and running your private cloud
Presentation   building and running your private cloudPresentation   building and running your private cloud
Presentation building and running your private cloud
solarisyourep
 
Presentation building your cloud with v mware
Presentation   building your cloud with v mwarePresentation   building your cloud with v mware
Presentation building your cloud with v mware
solarisyourep
 
Presentation business critical applications in a virtual env
Presentation   business critical applications in a virtual envPresentation   business critical applications in a virtual env
Presentation business critical applications in a virtual env
solarisyourep
 
Presentation cim1309 v cat 3.0 operating a v-mware cloud
Presentation   cim1309 v cat 3.0 operating a v-mware cloudPresentation   cim1309 v cat 3.0 operating a v-mware cloud
Presentation cim1309 v cat 3.0 operating a v-mware cloud
solarisyourep
 
Presentation cisco intelligent automation complementing and extending v mwa...
Presentation   cisco intelligent automation complementing and extending v mwa...Presentation   cisco intelligent automation complementing and extending v mwa...
Presentation cisco intelligent automation complementing and extending v mwa...
solarisyourep
 
Presentation cisco vxi–optimized infrastructure for scaling v mware view wi...
Presentation   cisco vxi–optimized infrastructure for scaling v mware view wi...Presentation   cisco vxi–optimized infrastructure for scaling v mware view wi...
Presentation cisco vxi–optimized infrastructure for scaling v mware view wi...
solarisyourep
 
Presentation cloud infrastructure and management – from v sphere to vcloud ...
Presentation   cloud infrastructure and management – from v sphere to vcloud ...Presentation   cloud infrastructure and management – from v sphere to vcloud ...
Presentation cloud infrastructure and management – from v sphere to vcloud ...
solarisyourep
 
Presentation cloud infrastructure launch – what’s new
Presentation   cloud infrastructure launch – what’s newPresentation   cloud infrastructure launch – what’s new
Presentation cloud infrastructure launch – what’s new
solarisyourep
 
Presentation cloud meets big
Presentation   cloud meets bigPresentation   cloud meets big
Presentation cloud meets big
solarisyourep
 
Presentation consuming a cloud
Presentation   consuming a cloudPresentation   consuming a cloud
Presentation consuming a cloud
solarisyourep
 
Presentation desktops for the cloud the view rollout
Presentation   desktops for the cloud the view rolloutPresentation   desktops for the cloud the view rollout
Presentation desktops for the cloud the view rollout
solarisyourep
 
Presentation disaster recovery in virtualization and cloud
Presentation   disaster recovery in virtualization and cloudPresentation   disaster recovery in virtualization and cloud
Presentation disaster recovery in virtualization and cloud
solarisyourep
 
Presentation drs advanced concepts, best practices and future directions
Presentation   drs advanced concepts, best practices and future directionsPresentation   drs advanced concepts, best practices and future directions
Presentation drs advanced concepts, best practices and future directions
solarisyourep
 
Presentation end-user computing in the post-pc era
Presentation   end-user computing in the post-pc eraPresentation   end-user computing in the post-pc era
Presentation end-user computing in the post-pc era
solarisyourep
 

Plus de solarisyourep (20)

Presentation a new era in it
Presentation   a new era in itPresentation   a new era in it
Presentation a new era in it
 
Presentation a vision for user centric computing
Presentation   a vision for user centric computingPresentation   a vision for user centric computing
Presentation a vision for user centric computing
 
Presentation advanced management – the road ahead
Presentation   advanced management – the road aheadPresentation   advanced management – the road ahead
Presentation advanced management – the road ahead
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
 
Presentation avoiding the 19 biggest ha & drs configuration mistakes
Presentation   avoiding the 19 biggest ha & drs configuration mistakesPresentation   avoiding the 19 biggest ha & drs configuration mistakes
Presentation avoiding the 19 biggest ha & drs configuration mistakes
 
Presentation blade center foundation for cloud
Presentation   blade center foundation for cloudPresentation   blade center foundation for cloud
Presentation blade center foundation for cloud
 
Presentation building and running your private cloud
Presentation   building and running your private cloudPresentation   building and running your private cloud
Presentation building and running your private cloud
 
Presentation building your cloud with v mware
Presentation   building your cloud with v mwarePresentation   building your cloud with v mware
Presentation building your cloud with v mware
 
Presentation business critical applications in a virtual env
Presentation   business critical applications in a virtual envPresentation   business critical applications in a virtual env
Presentation business critical applications in a virtual env
 
Presentation cim1309 v cat 3.0 operating a v-mware cloud
Presentation   cim1309 v cat 3.0 operating a v-mware cloudPresentation   cim1309 v cat 3.0 operating a v-mware cloud
Presentation cim1309 v cat 3.0 operating a v-mware cloud
 
Presentation cisco intelligent automation complementing and extending v mwa...
Presentation   cisco intelligent automation complementing and extending v mwa...Presentation   cisco intelligent automation complementing and extending v mwa...
Presentation cisco intelligent automation complementing and extending v mwa...
 
Presentation cisco vxi–optimized infrastructure for scaling v mware view wi...
Presentation   cisco vxi–optimized infrastructure for scaling v mware view wi...Presentation   cisco vxi–optimized infrastructure for scaling v mware view wi...
Presentation cisco vxi–optimized infrastructure for scaling v mware view wi...
 
Presentation cloud infrastructure and management – from v sphere to vcloud ...
Presentation   cloud infrastructure and management – from v sphere to vcloud ...Presentation   cloud infrastructure and management – from v sphere to vcloud ...
Presentation cloud infrastructure and management – from v sphere to vcloud ...
 
Presentation cloud infrastructure launch – what’s new
Presentation   cloud infrastructure launch – what’s newPresentation   cloud infrastructure launch – what’s new
Presentation cloud infrastructure launch – what’s new
 
Presentation cloud meets big
Presentation   cloud meets bigPresentation   cloud meets big
Presentation cloud meets big
 
Presentation consuming a cloud
Presentation   consuming a cloudPresentation   consuming a cloud
Presentation consuming a cloud
 
Presentation desktops for the cloud the view rollout
Presentation   desktops for the cloud the view rolloutPresentation   desktops for the cloud the view rollout
Presentation desktops for the cloud the view rollout
 
Presentation disaster recovery in virtualization and cloud
Presentation   disaster recovery in virtualization and cloudPresentation   disaster recovery in virtualization and cloud
Presentation disaster recovery in virtualization and cloud
 
Presentation drs advanced concepts, best practices and future directions
Presentation   drs advanced concepts, best practices and future directionsPresentation   drs advanced concepts, best practices and future directions
Presentation drs advanced concepts, best practices and future directions
 
Presentation end-user computing in the post-pc era
Presentation   end-user computing in the post-pc eraPresentation   end-user computing in the post-pc era
Presentation end-user computing in the post-pc era
 

Presentation architecting virtualized infrastructure for big data

  • 1. © 2009 VMware Inc. All rights reserved Architecting Virtualized Infrastructure for Big Data Richard McDougall @richardmcdougll CTO, Application Infrastructure, Big Data Lead, VMware, Inc
  • 2. 2 Cloud: Big Shifts in Simplification and Optimization 2. Dramatically Lower Costs to redirect investment into value-add opportunities 3. Enable Flexible, Agile IT Service Delivery to meet and anticipate the needs of the business 1. Reduce the Complexity to simplify operations and maintenance
  • 3. 3 Infrastructure, Apps and now Data… Private Public Build Run Manage Simplify Infrastructure With Cloud Simplify App Platform Through PaaS Simplify Data
  • 4. 4 Trend 1/3: New Data Growing at 60% Y/Y Source: The Information Explosion, 2009 medical(imaging,( sensors( cad/cam,(appliances,(machine(data,(digital(movies( digital(photos( digital(tv( audio( camera(phones,(rfid( satellite(images,(logs,(scanners,(twi7er( Exabytes of information stored 20 Zetta by 2015 1 Yotta by 2030 Yes, you are part of the yotta generation…
  • 5. 5 Data Growth in the Enterprise
  • 6. 6 Trend 2/3: Big Data – Driven by Real-World Benefit
  • 7. 7 Trend 3/3: Value from Data Exceeds Hardware Cost !  Value from the intelligence of data analytics now outstrips the cost of hardware •  Hadoop enables the use of 10x lower cost hardware •  Hardware cost halving every 18mo Big Iron: $40k/CPU Commodity Cluster: $1k/CPU Value Cost
  • 8. 8 A Holistic View of a Big Data System: ETL Real Time Streams Unstructured Data (HDFS) Real Time Structured Database (hBase, Gemfire, Cassandra) Big SQL (Greenplum, AsterData, Etc…) Batch Processin g Real-Time Processing (s4, storm) Analytics
  • 9. 9 Big Data Frameworks and Characteristics Framework Scale of data Scale of Cluster Computable Data? Local Disks? File System: Gluster, Isilon, etc,… 10s PB 100s Some Yes, for cost Map-reduce: Hadoop 100s PB 1,000s Yes Yes, for cost, bandwidth and availability Big-SQL: Greenplum, Aster Data, Netezza, … PB’s 100s Some Yes, for cost and bandwidth No-SQL: Cassandra, hBase, … Trilions Of rows 100s Some Yes, for cost and availability In-Memory: Redis, Gemfire, Membase, … Billions of rows 10s-100s Yes Primarily Memory
  • 10. 10 Cloud Infrastructure Data Platform Private Public Developer Frameworks The Unified Analytics Cloud Platform Analytics Tools vSphere Database/DataStore Cassandra Greenplum hBase Voldemort HDFS Data PaaS PaaS Hadoop Python Madlib Cloudfoundry Data Meer Karmasphere Spring Data-Director EMC Chorus Tableau
  • 11. 11 Unifying the Big Data Platform using Virtualization !  Goals •  Make it fast and easy to provision new data Clusters on Demand •  Allow Mixing of Workloads •  Leverage virtual machines to provide isolation (esp. for Multi-tenant) •  Optimize data performance based on virtual topologies •  Make the system reliable based on virtual topologies !  Leveraging Virtualization •  Elastic scale •  Use high-availability to protect key services, e.g., Hadoop’s namenode/job tracker •  Resource controls and sharing: re-use underutilized memory, cpu •  Prioritize Workloads: limit or guarantee resource usage in a mixed environment Cloud Infrastructure Private Public
  • 12. 12 SQLCluster Unifed Analytics Infrastructure Hadoop Cluster Private Public Big SQL A Unified Analytics Cloud Significantly Simplifies HadoopNoSQL Decision Support Cluster NoSQL Cluster !  Simplify • Single Hardware Infrastructure • Faster/Easier provisioning !  Optimize • Shared Resources = higher utilization • Elastic resources = faster on-demand access
  • 13. 13 Use Local Disk where it’s Needed SAN Storage $2 - $10/Gigabyte $1M gets: 0.5Petabytes 200,000 IOPS 1Gbyte/sec NAS Filers $1 - $5/Gigabyte $1M gets: 1 Petabyte 400,000 IOPS 2Gbyte/sec Local Storage $0.05/Gigabyte $1M gets: 20 Petabytes 10,000,000 IOPS 800 Gbytes/sec
  • 14. 14 VMware is Commited to be the Best Virtual platform for Hadoop !  Performance Studies and Best Practices •  Studies through 2010-2011 of Hadoop 0.20 on vSphere 5 •  White paper, including detailed configurations and recommendations !  Making Hadoop run well on vSphere •  Performance optimizations in vSphere releases •  VMware engagement in Hadoop Community effort •  Supporting key partners with their distibutions on vSphere •  Contributing enhancements to Hadoop !  Hadoop Framework Integration •  Spring Hadoop: Enabling Spring to simplify Map-Reduce Jobs •  Spring Batch: Sophisticated batch management (Oozie on steroids)
  • 15. 15 Extend Virtual Storage Architecture to Include Local Disk !  Shared Storage: SAN or NAS •  Easy to provision •  Automated cluster rebalancing !  Hybrid Storage •  SAN for boot images, VMs, other workloads •  Local disk for Hadoop & HDFS •  Scalable Bandwidth, Lower Cost/GB Host Hadoop OtherVM OtherVM Host Hadoop Hadoop OtherVM Host Hadoop Hadoop OtherVM Host Hadoop OtherVM OtherVM Host Hadoop Hadoop OtherVM Host Hadoop Hadoop OtherVM
  • 16. 16 Performance Analysis of Big Data (Hadoop) on Virtualization 0 0.2 0.4 0.6 0.8 1 1.2 RatiotoNative 1 VM 2 VMs Ratio of time taken – Lower is Better Tested on vSphere 5.0
  • 17. 17 Simplify Hetrogeneous Data Management via Data PaaS Cloud Infrastructure Data Platform Developer Analytics Tools Databases File- system Big SQL Large- Scale NoSQL In- Memor y Data PaaS – Common Data Management Layer Provisioning Management Multi-tenancy Data Discovery Import/Export Cloud Infrastructure
  • 18. 18 vFabric Data Director vFabric Data Director Powers Database-as-a-Service VMware vSphere Provisioning Backup/ Restore Clone One click HA Resource Mgmt Security Mgmt Database Templates Monitor DBA App Dev IT Admin Automation Self-Service Policy Based Control DBA Existing Applications New Applications
  • 19. 19 Data Systems: Databases, file systems Cloud Infrastructure Data Platform Developer Analytics Tools Databases File- system Big SQL Large- Scale NoSQL In- Memor y Unstructured Structured
  • 20. 20 Technology: Databases and Data Stores for Big Data File- system Big SQL Large- Scale NoSQL In- Memory Unstructured Structured Types of Data Log files, machine generated data, documents, device data, etc… Loosely typed device data, records, events, statistics, complex relations/graphs Structured, partitionable data Structured data Techno- logies NAS, HDFS, Blob (S3, Atmos, etc..) Cassandra, hBase, Voldemort Gemfire, Redis, Membase Greenplum, Sybase IQ, Aster Data, etc,. Values Store any data, easy to scale-out, can optimize for cost Easy to scale-out, flexible and dynamic schema’s High Throughput, low latency High performance for repetitive queries. Ease of query language.
  • 21. 21 Simplified Developer Experience through PaaS Cloud Infrastructure Data Platform Developer Analytics Tools Databases Platform as a Service
  • 22. 22 Spring Big Data Integrations !  NoSQL Integration •  Spring data for MongoDB, Gemfire, Riak, Neo4j, Blob, Cassandra !  Spring Hadoop •  Announced this week at Strata! •  Provides support for developing applications based on Hadoop technologies by leveraging the capabilities of the Spring ecosystem. !  Spring Batch •  Integration allows Hadoop jobs and HDFS operations as part of workflow
  • 23. 23 Cloud Infrastructure Data Platform Private Public Developer Frameworks The Unified Analytics Cloud Platform Analytics Tools vSphere Database/DataStore Cassandra Greenplum hBase Voldemort HDFS Data PaaS PaaS Hadoop Python Madlib Cloudfoundry Data Meer Karmasphere Spring Data-Director EMC Chorus Tableau
  • 24. 24 Summary !  Revolution in Big Data is under way •  Data centric applications are now critical !  Hadoop on Virtualization •  Proven performance •  Cloud/Virtualization values apparent for Hadoop use !  Simplify through a Unified Analytics Cloud •  One Platform for today’s and future big-data systems •  Better Utilization •  Faster deployment, elastic resources •  Secure, Isolated, Multi-tenant capability for Analytics
  • 25. 25 References !  Twitter •  @richardmcdougll !  My CTO Blog •  http://communities.vmware.com/community/vmtn/cto/cloud !  Hadoop on vSphere •  Talk @ Hadoop World •  Performance Paper – http://www.vmware.com/files/.../VMW-Hadoop-Performance-vSphere5.pdf !  Spring Hadoop •  http://blog.springsource.org/2012/02/29/introducing-spring-hadoop