SlideShare une entreprise Scribd logo
1  sur  25
© 2009 VMware Inc. All rights reserved
Architecting Virtualized Infrastructure for Big Data
Richard McDougall
@richardmcdougll
CTO, Application Infrastructure, Big Data Lead, VMware, Inc
2
Cloud: Big Shifts in Simplification and Optimization
2. Dramatically Lower
Costs
to redirect investment into
value-add opportunities
3. Enable Flexible, Agile
IT Service Delivery
to meet and anticipate the
needs of the business
1. Reduce the Complexity
to simplify operations
and maintenance
3
Infrastructure, Apps and now Data…
Private
Public
Build Run
Manage
Simplify Infrastructure
With Cloud
Simplify App Platform
Through PaaS
Simplify Data
4
Trend 1/3: New Data Growing at 60% Y/Y
Source: The Information Explosion, 2009
medical imaging, sensors
cad/cam, appliances, videoconfercing, digital movies
digital photos
digital tv
audio
camera phones, rfid
satellite images, games, scanners, twitter
Exabytes of information stored 20 Zetta by 2015
1 Yotta by 2030
Yes, you are part
of the yotta
generation…
5
Data Growth in the Enterprise
6
Trend 2/3: Big Data – Driven by Real-World Benefit
7
Trend 3/3: Value from Data Exceeds Hardware Cost
 Value from the intelligence of data analytics now outstrips the cost
of hardware
• Hadoop enables the use of 10x lower cost hardware
• Hardware cost halving every 18mo
Big Iron:
$40k/CPU
Commodity
Cluster:
$1k/CPU
Value
Cost
8
A Holistic View of a Big Data System:
ETL
Real Time
Streams
Unstructured Data (HDFS)
Real Time
Structured
Database
(hBase,
Gemfire,
Cassandra)
Big SQL
(Greenplum,
AsterData,
Etc…)
Batch
Processing
Real-Time
Processing
(s4, storm)
Analytics
9
Big Data Frameworks and Characteristics
Framework Scale of
data
Scale of
Cluster
Computable
Data?
Local
Disks?
File System:
Gluster, Isilon, etc,…
10s PB 100s No Yes, for cost
Map-reduce:
Hadoop
100s PB 1,000s Yes Yes, for cost
and bandwidth
Big-SQL:
Greenplum, Aster Data,
Netezza, …
PB’s 100s No Yes, for cost
and bandwidth
No-SQL:
Cassandra, hBase, …
Trilions
Of rows
100s Future Yes, for cost
and availability
In-Memory:
Redis, Gemfire,
Membase, …
Billions of
rows
10s-100s Hybrid
Possible
Primarily
Memory
10
Cloud Infrastructure
Data Platform
Private
Public
Developer
Frameworks
The Unified Analytics Cloud Platform
Analytics Tools
vSphere
Database/DataStore
Cassandra
Greenplum
hBase
Voldemort
HDFS
Data PaaS
PaaS
Hadoop
Python
Madlib
Cloudfoundry
Data Meer
Karmasphere
Spring
Data-Director
EMC Chorus
Tableau
11
Unifying the Big Data Platform using Virtualization
 Goals
• Make it fast and easy to provision new data Clusters on Demand
• Allow Mixing of Workloads
• Leverage virtual machines to provide isolation (esp. for Multi-tenant)
• Optimize data performance based on virtual topologies
• Make the system reliable based on virtual topologies
 Leveraging Virtualization
• Elastic scale
• Use high-availability to protect key services, e.g., Hadoop’s namenode/job
tracker
• Resource controls and sharing: re-use underutilized memory, cpu
• Prioritize Workloads: limit or guarantee resource usage in a mixed
environment
12
SQLCluster
Unifed Analytics Infrastructure
Hadoop Cluster
Private
Public
Big SQL
A Unified Analytics Cloud Significantly Simplifies
HadoopNoSQL
Decision Support Cluster
NoSQL Cluster
 Simplify
• Single Hardware Infrastructure
• Faster/Easier provisioning
 Optimize
• Shared Resources = higher utilization
• Elastic resources = faster on-demand access
13
Use Local Disk where it’s Needed
SAN Storage
$2 - $10/Gigabyte
$1M gets:
0.5Petabytes
200,000 IOPS
1Gbyte/sec
NAS Filers
$1 - $5/Gigabyte
$1M gets:
1 Petabyte
400,000 IOPS
2Gbyte/sec
Local Storage
$0.05/Gigabyte
$1M gets:
20 Petabytes
10,000,000 IOPS
800 Gbytes/sec
14
VMware is Commited to the Best Virtual platform for Hadoop
 Performance Studies and Best Practices
• Studies through 2010-2011 of Hadoop 0.20 on vSphere 5
• White paper, including detailed configurations and recommendations
 Making Hadoop run well on vSphere
• Performance optimizations in vSphere releases
• VMware engagement in Hadoop Community effort
• Supporting key partners with their distibutions on vSphere
• Contributing enhancements to Hadoop
 Hadoop Framework Integration
• Spring Hadoop: Enabling Spring to simplify Map-Reduce Programming
• Spring Batch: Sophisticated batch management (Oozie on steroids)
15
Extend Virtual Storage Architecture to Include Local Disk
 Shared Storage: SAN or NAS
• Easy to provision
• Automated cluster rebalancing
 Hybrid Storage
• SAN for boot images, VMs, other
workloads
• Local disk for Hadoop & HDFS
• Scalable Bandwidth, Lower Cost/GB
Host
Hadoop
OtherVM
OtherVM
Host
Hadoop
Hadoop
OtherVM
Host
Hadoop
Hadoop
OtherVM
Host
Hadoop
OtherVM
OtherVM
Host
Hadoop
Hadoop
OtherVM
Host
Hadoop
Hadoop
OtherVM
16
Performance Analysis of Big Data (Hadoop) on Virtualization
0
0.2
0.4
0.6
0.8
1
1.2
RatiotoNative
1 VM
2 VMs
Ratio of time taken – Lower is Better
Tested on vSphere 5.0
17
Simplify Hetrogeneous Data Management via Data PaaS
Cloud Infrastructure
Data Platform
Developer
Analytics Tools
Databases
File-
system
Big
SQL
Large-
Scale
NoSQL
In-
Memory
Data PaaS – Common Data Management Layer
Provisioning
Management
Multi-tenancy
Data Discovery
Import/Export
Cloud Infrastructure
18
vFabric Data Director
vFabric Data Director Powers Database-as-a-Service
VMware vSphere
Provisioning
Backup/
Restore
Clone
One click
HA
Resource
Mgmt
Security
Mgmt
Database
Templates
Monitor
DBA App Dev
IT Admin
Automation
Self-Service
Policy Based
Control
DBA
Existing Applications New Applications
19
Data Systems: Databases, file systems
Cloud Infrastructure
Data Platform
Developer
Analytics Tools
Databases
File-
system
Big
SQL
Large-
Scale
NoSQL
In-
Memory
Unstructured Structured
20
Technology: Databases and Data Stores for Big Data
File-
system
Big
SQL
Large-
Scale
NoSQL
In-
Memory
Unstructured Structured
Types of
Data
Log files, machine
generated data,
documents,
device data, etc…
Loosely typed device
data, records, events,
statistics, complex
relations/graphs
Structured,
partitionable data
Structured data
Techno-
logies
NAS, HDFS, Blob
(S3, Atmos, etc..)
Cassandra, hBase,
Voldemort
Gemfire, Redis,
Membase
Greenplum, Sybase
IQ, Aster Data, etc,.
Values
Store any data,
easy to scale-out,
can optimize for
cost
Easy to scale-out,
flexible and dynamic
schema’s
High Throughput, low
latency
High performance for
repetitive queries.
Ease of query
language.
21
Simplified Developer Experience through PaaS
Cloud Infrastructure
Data Platform
Developer
Analytics Tools
Databases
Platform as a Service
22
Spring Big Data Integrations
 NoSQL Integration
• Spring data for MongoDB, Gemfire, Riak, Neo4j, Blob, Cassandra
 Spring Hadoop
• Announced this week at Strata!
• Provides support for developing applications based on Hadoop technologies
by leveraging the capabilities of the Spring ecosystem.
 Spring Batch
• Integration allows Hadoop jobs and HDFS operations as part of workflow
23
Cloud Infrastructure
Data Platform
Private
Public
Developer
Frameworks
The Unified Analytics Cloud Platform
Analytics Tools
vSphere
Database/DataStore
Cassandra
Greenplum
hBase
Voldemort
HDFS
Data PaaS
PaaS
Hadoop
Python
Madlib
Cloudfoundry
Data Meer
Karmasphere
Spring
Data-Director
EMC Chorus
Tableau
24
Summary
 Revolution in Big Data is under way
• Data centric applications are now critical
 Hadoop on Virtualization
• Proven performance
• Cloud/Virtualization values apparent for Hadoop use
 Simplify through a Unified Analytics Cloud
• One Platform for today’s and future big-data systems
• Better Utilization
• Faster deployment, elastic resources
• Secure, Isolated, Multi-tenant capability for Analytics
25
References
 Twitter
• @richardmcdougll
 My CTO Blog
• http://communities.vmware.com/community/vmtn/cto/cloud
 Hadoop on vSphere
• Talk @ Hadoop World
• Performance Paper – http://www.vmware.com/files/.../VMW-Hadoop-Performance-vSphere5.pdf
 Spring Hadoop
• http://blog.springsource.org/2012/02/29/introducing-spring-hadoop

Contenu connexe

Tendances

Introduction To Hibernate
Introduction To HibernateIntroduction To Hibernate
Introduction To Hibernateashishkulkarni
 
Batching and Java EE (jdk.io)
Batching and Java EE (jdk.io)Batching and Java EE (jdk.io)
Batching and Java EE (jdk.io)Ryan Cuprak
 
Database Connection Pooling With c3p0
Database Connection Pooling With c3p0Database Connection Pooling With c3p0
Database Connection Pooling With c3p0Kasun Madusanke
 
HTTP Session Replication with Oracle Coherence, GlassFish, WebLogic
HTTP Session Replication with Oracle Coherence, GlassFish, WebLogicHTTP Session Replication with Oracle Coherence, GlassFish, WebLogic
HTTP Session Replication with Oracle Coherence, GlassFish, WebLogicOracle
 
5050 dev nation
5050 dev nation5050 dev nation
5050 dev nationArun Gupta
 
Security Multitenant
Security MultitenantSecurity Multitenant
Security MultitenantArush Jain
 
Java Web Programming Using Cloud Platform: Module 3
Java Web Programming Using Cloud Platform: Module 3Java Web Programming Using Cloud Platform: Module 3
Java Web Programming Using Cloud Platform: Module 3IMC Institute
 
JPA and Coherence with TopLink Grid
JPA and Coherence with TopLink GridJPA and Coherence with TopLink Grid
JPA and Coherence with TopLink GridJames Bayer
 
Spring dependency injection
Spring dependency injectionSpring dependency injection
Spring dependency injectionsrmelody
 
A first Draft to Java Configuration
A first Draft to Java ConfigurationA first Draft to Java Configuration
A first Draft to Java ConfigurationAnatole Tresch
 
AAI 1713-Introduction to Java EE 7
AAI 1713-Introduction to Java EE 7AAI 1713-Introduction to Java EE 7
AAI 1713-Introduction to Java EE 7Kevin Sutter
 
Dependency Injection in Spring in 10min
Dependency Injection in Spring in 10minDependency Injection in Spring in 10min
Dependency Injection in Spring in 10minCorneil du Plessis
 

Tendances (19)

Servlet programming
Servlet programmingServlet programming
Servlet programming
 
Maven
MavenMaven
Maven
 
Orcale Presentation
Orcale PresentationOrcale Presentation
Orcale Presentation
 
Introduction To Hibernate
Introduction To HibernateIntroduction To Hibernate
Introduction To Hibernate
 
Jspprogramming
JspprogrammingJspprogramming
Jspprogramming
 
jsf2 Notes
jsf2 Notesjsf2 Notes
jsf2 Notes
 
Batching and Java EE (jdk.io)
Batching and Java EE (jdk.io)Batching and Java EE (jdk.io)
Batching and Java EE (jdk.io)
 
Database Connection Pooling With c3p0
Database Connection Pooling With c3p0Database Connection Pooling With c3p0
Database Connection Pooling With c3p0
 
HTTP Session Replication with Oracle Coherence, GlassFish, WebLogic
HTTP Session Replication with Oracle Coherence, GlassFish, WebLogicHTTP Session Replication with Oracle Coherence, GlassFish, WebLogic
HTTP Session Replication with Oracle Coherence, GlassFish, WebLogic
 
Jdbc
JdbcJdbc
Jdbc
 
5050 dev nation
5050 dev nation5050 dev nation
5050 dev nation
 
Security Multitenant
Security MultitenantSecurity Multitenant
Security Multitenant
 
Java Web Programming Using Cloud Platform: Module 3
Java Web Programming Using Cloud Platform: Module 3Java Web Programming Using Cloud Platform: Module 3
Java Web Programming Using Cloud Platform: Module 3
 
JPA and Coherence with TopLink Grid
JPA and Coherence with TopLink GridJPA and Coherence with TopLink Grid
JPA and Coherence with TopLink Grid
 
Angularj2.0
Angularj2.0Angularj2.0
Angularj2.0
 
Spring dependency injection
Spring dependency injectionSpring dependency injection
Spring dependency injection
 
A first Draft to Java Configuration
A first Draft to Java ConfigurationA first Draft to Java Configuration
A first Draft to Java Configuration
 
AAI 1713-Introduction to Java EE 7
AAI 1713-Introduction to Java EE 7AAI 1713-Introduction to Java EE 7
AAI 1713-Introduction to Java EE 7
 
Dependency Injection in Spring in 10min
Dependency Injection in Spring in 10minDependency Injection in Spring in 10min
Dependency Injection in Spring in 10min
 

Similaire à Architecting virtualized infrastructure for big data presentation

Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big datasolarisyourep
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big dataxKinAnx
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Richard McDougall
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the CloudKellyn Pot'Vin-Gorman
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeDenodo
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-finalHaluk Ulubay
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeDenodo
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Scale-on-Scale : Part 1 of 3 - Production Environment
Scale-on-Scale : Part 1 of 3 - Production EnvironmentScale-on-Scale : Part 1 of 3 - Production Environment
Scale-on-Scale : Part 1 of 3 - Production EnvironmentScale Computing
 
Trusted Reliability & Performance with the AppExchange Platform
Trusted Reliability & Performance with the AppExchange PlatformTrusted Reliability & Performance with the AppExchange Platform
Trusted Reliability & Performance with the AppExchange Platformdreamforce2006
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
State of the Union: Database & Analytics
State of the Union: Database & AnalyticsState of the Union: Database & Analytics
State of the Union: Database & AnalyticsAmazon Web Services
 

Similaire à Architecting virtualized infrastructure for big data presentation (20)

Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-final
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
WTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The FundamentalsWTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The Fundamentals
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Scale-on-Scale : Part 1 of 3 - Production Environment
Scale-on-Scale : Part 1 of 3 - Production EnvironmentScale-on-Scale : Part 1 of 3 - Production Environment
Scale-on-Scale : Part 1 of 3 - Production Environment
 
Trusted Reliability & Performance with the AppExchange Platform
Trusted Reliability & Performance with the AppExchange PlatformTrusted Reliability & Performance with the AppExchange Platform
Trusted Reliability & Performance with the AppExchange Platform
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
State of the Union: Database & Analytics
State of the Union: Database & AnalyticsState of the Union: Database & Analytics
State of the Union: Database & Analytics
 

Dernier

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Dernier (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Architecting virtualized infrastructure for big data presentation

  • 1. © 2009 VMware Inc. All rights reserved Architecting Virtualized Infrastructure for Big Data Richard McDougall @richardmcdougll CTO, Application Infrastructure, Big Data Lead, VMware, Inc
  • 2. 2 Cloud: Big Shifts in Simplification and Optimization 2. Dramatically Lower Costs to redirect investment into value-add opportunities 3. Enable Flexible, Agile IT Service Delivery to meet and anticipate the needs of the business 1. Reduce the Complexity to simplify operations and maintenance
  • 3. 3 Infrastructure, Apps and now Data… Private Public Build Run Manage Simplify Infrastructure With Cloud Simplify App Platform Through PaaS Simplify Data
  • 4. 4 Trend 1/3: New Data Growing at 60% Y/Y Source: The Information Explosion, 2009 medical imaging, sensors cad/cam, appliances, videoconfercing, digital movies digital photos digital tv audio camera phones, rfid satellite images, games, scanners, twitter Exabytes of information stored 20 Zetta by 2015 1 Yotta by 2030 Yes, you are part of the yotta generation…
  • 5. 5 Data Growth in the Enterprise
  • 6. 6 Trend 2/3: Big Data – Driven by Real-World Benefit
  • 7. 7 Trend 3/3: Value from Data Exceeds Hardware Cost  Value from the intelligence of data analytics now outstrips the cost of hardware • Hadoop enables the use of 10x lower cost hardware • Hardware cost halving every 18mo Big Iron: $40k/CPU Commodity Cluster: $1k/CPU Value Cost
  • 8. 8 A Holistic View of a Big Data System: ETL Real Time Streams Unstructured Data (HDFS) Real Time Structured Database (hBase, Gemfire, Cassandra) Big SQL (Greenplum, AsterData, Etc…) Batch Processing Real-Time Processing (s4, storm) Analytics
  • 9. 9 Big Data Frameworks and Characteristics Framework Scale of data Scale of Cluster Computable Data? Local Disks? File System: Gluster, Isilon, etc,… 10s PB 100s No Yes, for cost Map-reduce: Hadoop 100s PB 1,000s Yes Yes, for cost and bandwidth Big-SQL: Greenplum, Aster Data, Netezza, … PB’s 100s No Yes, for cost and bandwidth No-SQL: Cassandra, hBase, … Trilions Of rows 100s Future Yes, for cost and availability In-Memory: Redis, Gemfire, Membase, … Billions of rows 10s-100s Hybrid Possible Primarily Memory
  • 10. 10 Cloud Infrastructure Data Platform Private Public Developer Frameworks The Unified Analytics Cloud Platform Analytics Tools vSphere Database/DataStore Cassandra Greenplum hBase Voldemort HDFS Data PaaS PaaS Hadoop Python Madlib Cloudfoundry Data Meer Karmasphere Spring Data-Director EMC Chorus Tableau
  • 11. 11 Unifying the Big Data Platform using Virtualization  Goals • Make it fast and easy to provision new data Clusters on Demand • Allow Mixing of Workloads • Leverage virtual machines to provide isolation (esp. for Multi-tenant) • Optimize data performance based on virtual topologies • Make the system reliable based on virtual topologies  Leveraging Virtualization • Elastic scale • Use high-availability to protect key services, e.g., Hadoop’s namenode/job tracker • Resource controls and sharing: re-use underutilized memory, cpu • Prioritize Workloads: limit or guarantee resource usage in a mixed environment
  • 12. 12 SQLCluster Unifed Analytics Infrastructure Hadoop Cluster Private Public Big SQL A Unified Analytics Cloud Significantly Simplifies HadoopNoSQL Decision Support Cluster NoSQL Cluster  Simplify • Single Hardware Infrastructure • Faster/Easier provisioning  Optimize • Shared Resources = higher utilization • Elastic resources = faster on-demand access
  • 13. 13 Use Local Disk where it’s Needed SAN Storage $2 - $10/Gigabyte $1M gets: 0.5Petabytes 200,000 IOPS 1Gbyte/sec NAS Filers $1 - $5/Gigabyte $1M gets: 1 Petabyte 400,000 IOPS 2Gbyte/sec Local Storage $0.05/Gigabyte $1M gets: 20 Petabytes 10,000,000 IOPS 800 Gbytes/sec
  • 14. 14 VMware is Commited to the Best Virtual platform for Hadoop  Performance Studies and Best Practices • Studies through 2010-2011 of Hadoop 0.20 on vSphere 5 • White paper, including detailed configurations and recommendations  Making Hadoop run well on vSphere • Performance optimizations in vSphere releases • VMware engagement in Hadoop Community effort • Supporting key partners with their distibutions on vSphere • Contributing enhancements to Hadoop  Hadoop Framework Integration • Spring Hadoop: Enabling Spring to simplify Map-Reduce Programming • Spring Batch: Sophisticated batch management (Oozie on steroids)
  • 15. 15 Extend Virtual Storage Architecture to Include Local Disk  Shared Storage: SAN or NAS • Easy to provision • Automated cluster rebalancing  Hybrid Storage • SAN for boot images, VMs, other workloads • Local disk for Hadoop & HDFS • Scalable Bandwidth, Lower Cost/GB Host Hadoop OtherVM OtherVM Host Hadoop Hadoop OtherVM Host Hadoop Hadoop OtherVM Host Hadoop OtherVM OtherVM Host Hadoop Hadoop OtherVM Host Hadoop Hadoop OtherVM
  • 16. 16 Performance Analysis of Big Data (Hadoop) on Virtualization 0 0.2 0.4 0.6 0.8 1 1.2 RatiotoNative 1 VM 2 VMs Ratio of time taken – Lower is Better Tested on vSphere 5.0
  • 17. 17 Simplify Hetrogeneous Data Management via Data PaaS Cloud Infrastructure Data Platform Developer Analytics Tools Databases File- system Big SQL Large- Scale NoSQL In- Memory Data PaaS – Common Data Management Layer Provisioning Management Multi-tenancy Data Discovery Import/Export Cloud Infrastructure
  • 18. 18 vFabric Data Director vFabric Data Director Powers Database-as-a-Service VMware vSphere Provisioning Backup/ Restore Clone One click HA Resource Mgmt Security Mgmt Database Templates Monitor DBA App Dev IT Admin Automation Self-Service Policy Based Control DBA Existing Applications New Applications
  • 19. 19 Data Systems: Databases, file systems Cloud Infrastructure Data Platform Developer Analytics Tools Databases File- system Big SQL Large- Scale NoSQL In- Memory Unstructured Structured
  • 20. 20 Technology: Databases and Data Stores for Big Data File- system Big SQL Large- Scale NoSQL In- Memory Unstructured Structured Types of Data Log files, machine generated data, documents, device data, etc… Loosely typed device data, records, events, statistics, complex relations/graphs Structured, partitionable data Structured data Techno- logies NAS, HDFS, Blob (S3, Atmos, etc..) Cassandra, hBase, Voldemort Gemfire, Redis, Membase Greenplum, Sybase IQ, Aster Data, etc,. Values Store any data, easy to scale-out, can optimize for cost Easy to scale-out, flexible and dynamic schema’s High Throughput, low latency High performance for repetitive queries. Ease of query language.
  • 21. 21 Simplified Developer Experience through PaaS Cloud Infrastructure Data Platform Developer Analytics Tools Databases Platform as a Service
  • 22. 22 Spring Big Data Integrations  NoSQL Integration • Spring data for MongoDB, Gemfire, Riak, Neo4j, Blob, Cassandra  Spring Hadoop • Announced this week at Strata! • Provides support for developing applications based on Hadoop technologies by leveraging the capabilities of the Spring ecosystem.  Spring Batch • Integration allows Hadoop jobs and HDFS operations as part of workflow
  • 23. 23 Cloud Infrastructure Data Platform Private Public Developer Frameworks The Unified Analytics Cloud Platform Analytics Tools vSphere Database/DataStore Cassandra Greenplum hBase Voldemort HDFS Data PaaS PaaS Hadoop Python Madlib Cloudfoundry Data Meer Karmasphere Spring Data-Director EMC Chorus Tableau
  • 24. 24 Summary  Revolution in Big Data is under way • Data centric applications are now critical  Hadoop on Virtualization • Proven performance • Cloud/Virtualization values apparent for Hadoop use  Simplify through a Unified Analytics Cloud • One Platform for today’s and future big-data systems • Better Utilization • Faster deployment, elastic resources • Secure, Isolated, Multi-tenant capability for Analytics
  • 25. 25 References  Twitter • @richardmcdougll  My CTO Blog • http://communities.vmware.com/community/vmtn/cto/cloud  Hadoop on vSphere • Talk @ Hadoop World • Performance Paper – http://www.vmware.com/files/.../VMW-Hadoop-Performance-vSphere5.pdf  Spring Hadoop • http://blog.springsource.org/2012/02/29/introducing-spring-hadoop