SlideShare une entreprise Scribd logo
1  sur  32
Hadoop in Virtual Machines
     Richard McDougall, VMware
      Sanjay Radia, Hortonworks

       Hadoop Summit, 2012
Part 1
Say What?
•   VMs will just add overhead, due to I/O virt
•   VMs run on SAN, we’re all about local disks
•   Hadoop does it’s own cluster management
•   It’ll do resource management in 2.0
•   And even HA is coming to Hadoop

• And… what is the point, anyway?
But you’ve been asking…
• Can I virtualize my Hadoop, so that I can make
  it easier, quicker to get a cluster up and
  running
• Is it possible to run Hadoop on those spare
  machine cycles I have on hundreds/thousands
  of nodes?
• Can I make my system more available by using
  some of the standard HA features?
And the savvy are asking…
• Can I avoid having to install special hardware
  for the master services, like name-node, job-
  tracker?
• Can I dynamically change the size of the
  cluster to use more resources?
• Can I use VM isolation to increase security or
  guard against resource-intensive neighbors?
• Is it feasible to provision virtual-clusters, giving
  out one each to a business unit?
Ok, so first what about the concerns?
• Use your SAN? … if you want to.




   SAN Storage          NAS Filers       Local Storage

 $2 - $10/Gigabyte   $1 - $5/Gigabyte   $0.05/Gigabyte

     $1M gets:          $1M gets:          $1M gets:
   0.5Petabytes        1 Petabyte        20 Petabytes
  1,000,000 IOPS      400,000 IOPS      10,000,000 IOPS
    1Gbyte/sec         2Gbyte/sec       800 Gbytes/sec
Hadoop Using Local Disks

                          Task Tracker             Datanode
Other          Hadoop
Workload       Virtual
               Machine
                                            Ext4      Ext4    Ext4




Virtualization Host       OS Image - VMDK   VMDK     VMDK     VMDK



                Shared
                Storage
Hadoop Perf in a VM
(Ratio is elapsed time to physical, Lower Is Better)
                   1.2

                    1
 Ratio to Native




                   0.8

                   0.6

                   0.4                             1 VM
                                                   2 VMs
                   0.2

                    0
Evolution of Hadoop on VMs
VM                    VM                         VM             VM

     Current
     Hadoop:               Compute                    T1             T2

     Combined         VM                         VM
     Storage/Co            Storage                    Storage
     mpute


Hadoop in VM               Separate Storage       Separate Compute Clusters
- VM lifecycle             - Separate compute     - Separate virtual clusters
  determined                 from data              per tenant
  by Datanode              - Elastic compute      - Stronger VM-grade security
- NOT Elastic              - Enable shared          and resource isolation
- Limited to Hadoop          workloads            - Enable deployment of
  Multi-Tenancy            - Raise utilization      multiple Hadoop runtime
                                                    versions
1. Hadoop Task Tracker and Data Node in a VM

                                                  Add/Remove
                                       Slot
                                                  Slots?
                                       Slot

  Other
                         Virtual   Task Tracker
                         Hadoop
  Workload
                         Node

                                    Datanode
                                                         Grow/Shrink
                                                         by tens of GB?



  Virtualization Host                 VMDK




Grow/Shrink of a VM is one
approach
2. Add/remove Virtual Nodes

                                      Slot                     Slot
                                      Slot                     Slot

 Other
                        Virtual   Task Tracker   Virtual   Task Tracker
                        Hadoop                   Hadoop
 Workload
                        Node                     Node

                                   Datanode                 Datanode




 Virtualization Host                 VMDK                     VMDK




Just add/remove more
virtual nodes?
But State makes it hard to power-off a node

                                          Slot
                                          Slot

Other
                            Virtual   Task Tracker
                            Hadoop
Workload
                            Node

                                       Datanode




Virtualization Host                      VMDK




 Powering off the Hadoop VM
 would in effect fail the datanode
Adding a node needs data…

                                         Slot                     Slot
                                         Slot                     Slot

Other
                           Virtual   Task Tracker   Virtual   Task Tracker
                           Hadoop                   Hadoop
Workload
                           Node                     Node

                                      Datanode                 Datanode




Virtualization Host                     VMDK                     VMDK




Adding a node would require TBs of
data replication
2. Separated Compute and Data

                                                                           Slot
                                      Slot                Virtual   Slot
                                                       Virtual
                                                          Hadoop           Slot
                        Virtual       Slot           Virtual
                                                       Hadoop       Slot
                        Hadoop                            Node
                                                     Hadoop
                                                       Node
                        Node                         Node           Task Tracker
  Other                           Task Tracker                  Task Tracker
  Workload




                        Virtual
                        Hadoop                   Datanode
                        Node



  Virtualization Host                VMDK                           VMDK



Truly Elastic Hadoop:
Scalable through virtual
nodes
Dataflow with separated Compute/Data

                                     Slot
                      Virtual        Slot                       Virtual
                      Hadoop                                    Hadoop
                      Node                                      Node             Datanode
                                NodeManager




                                 Virtual NIC                       Virtual NIC




Virtualization Host                            Virtual Switch                      VMDK


                                                 NIC Drivers
Performance Analysis of Split

1 Combined Compute/Datanode VM per Host   1 Datanode VM, 1 Compute nodes VM per Host




          Node             Node                  Node              Node
         Manager          Manager               Manager          Manager
         Datanode         Datanode

                                               Datanode         Datanode




 Workload: Teragen, Terasort, Teravalidate
 HW Configuration: 8 cores, 96GB RAM, 16 disks per host x 2 nodes
Performance Analysis of Split
                (Elapsed time: ratio to combined)
1.2


 1


0.8


0.6                                                    Combined
                                                       Split
0.4


0.2


 0
      Teragen            Terasort       Teravalidate
Tying it together: Elastic Hadoop
                                         Coke                        Pepsi




                                                   Hadoop
                                                   Hadoop




                                                                             Hadoop
                        Hadoop




                                                   Queue
                                                   Virtual
                                                   Virtual




                                                                             Virtual
                        Virtual




 Runtime
 Layer

Data Layer
                        Namespace                 Namespace         Namespace


             Distributed File System (HDFS, KFS, GPFS, MAPR, Isilon,…)


                 Host             Host          Host         Host   Host        Host
Demo: Shrink/Expand Cluster
Demo: Shrink/Expand Cluster
Setup 1 Datanodes, 2 Nodemanagers and 2 web servers on
each physical host

      Web Server       Web Server      Web Server         Web Server

      Web Server       Web Server      Web Server         Web Server

     NodeManager      NodeManager     NodeManager        NodeManager

     NodeManager      NodeManager     NodeManager        NodeManager


      Datanode         Datanode        Datanode           Datanode
Demo: Shrink/Expand Cluster
When web load is high in daytime, we can suspend some Nodemanagers and
power on more Web servers.

      Web Server       Web Server      Web Server       Web Server

      Web Server       Web Server      Web Server       Web Server

     NodeManager      NodeManager     NodeManager      NodeManager

     NodeManager      NodeManager     NodeManager      NodeManager


      Datanode         Datanode        Datanode         Datanode
Demo
Part 2
Expand Hadoop Ecosystem
• Hortonworks goal
  – Expand Hadoop ecosystem
  – Provide first class support of various platforms
• Hadoop should run well on VMs
     • VMs offer several advantages as presented earlier
• Take advantage of vSphere for HA



                                                           Page 25
VMware-Hortonworks Joint
            Engineering
• First class support for VMs
  – Topology plugins (Hadoop-8468)
     • 2 VMs can be on same host
         – Pick closer data
         – Schedule tasks closer
         – Don’t put two replicas on same host
  – MR-tmp on HDFS using block pools
     • Elastic Compute-VMs will not need local disk
  – Fast communications within VMs

                                                      Page 26
Hadoop Total System Availability
                  Architecture
                                 Slave Nodes of Hadoop Cluster


                   job             job             job   job    job


 Apps
Running
Outside
                                            Failover

                         JT into Safemode

              NN                              JT               NN
                                                                           N+K
               Server                          Server           Server   failover

                            HA Cluster for Master Daemons
                                                                               27
HA is coming in 1.0
Using Total System Availability Architecture




                                               28
 © Hortonworks Inc. 2011
HA in Hadoop 1 with HDP1
• Total System Availability Architecture
   – Namenode
      • Clients pause automatically
      • JobTracker pauses automatically
   – Other Hadoop master services (JT, …) coming

• Use industry proven HA framework
   – VMWare vSphere-HA
      • Failover, fencing, …
      • Corner cases are tricky – if not addressed, corruption
   – Addition benefits:
      • N-N & N+K failover
      • Migration for maintenance

                                                                 29
Hadoop NN/JT HA with vSphere




                           Page 30
NameNode HA – Failover Times

• NameNode Failover times with vSphere and LinuxHA
   – Failure detection + Failover – 0.5 to 2 minutes
   – OS bootup needed for vSphere – 10-20 seconds
   – Namenode Startup (exit safemode)
       • Small/Medium clusters – 1 to 2 minutes
       • Large cluster – 5 to 15 minutes

• NameNode startup time measurements
   – 60 Nodes, 60K files, 6 million blocks, 300 TB raw storage – 40 sec
   – 180 Nodes, 200K files, 18 million blocks, 900TB raw storage – 120 sec

  Cold Failover is good enough for small/medium clusters
       Failure Detection and Automatic Failover Dominates
                                                                             31
Demo
Summary
• Advantages of Hadoop on VMs
  – Cluster Management
  – Cluster consolidation
  – Greater Elasticity in mixed environment
  – Alternate multi-tenancy to capacity scheduler’s
    offerings
• HA for Hadoop Master Daemons
  – vSphere based HA for NN, JT, … in Hadoop 1
  – Total System Availability Architecture

                                                      Page 33

Contenu connexe

Tendances

Gluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGlusterFS
 
Cloud Storage Adoption, Practice, and Deployment
Cloud Storage Adoption, Practice, and DeploymentCloud Storage Adoption, Practice, and Deployment
Cloud Storage Adoption, Practice, and DeploymentGlusterFS
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineDataWorks Summit
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHortonworks
 
CloudStack Architecture Future
CloudStack Architecture FutureCloudStack Architecture Future
CloudStack Architecture FutureKimihiko Kitase
 
CloudStack-Developer-Day
CloudStack-Developer-DayCloudStack-Developer-Day
CloudStack-Developer-DayKimihiko Kitase
 
Presentation introduction to cloud computing and technical issues
Presentation   introduction to cloud computing and technical issuesPresentation   introduction to cloud computing and technical issues
Presentation introduction to cloud computing and technical issuesxKinAnx
 
Using Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data AnalysisUsing Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data AnalysisScaleOut Software
 
Postgres Plus Cloud Database
Postgres Plus Cloud DatabasePostgres Plus Cloud Database
Postgres Plus Cloud DatabaseGary Carter
 
Vizuri Exadata East Coast Users Conference
Vizuri Exadata East Coast Users ConferenceVizuri Exadata East Coast Users Conference
Vizuri Exadata East Coast Users ConferenceIsaac Christoffersen
 
Apache CloudStack Architecture by Alex Huang
Apache CloudStack Architecture by Alex HuangApache CloudStack Architecture by Alex Huang
Apache CloudStack Architecture by Alex Huangbuildacloud
 
Virtualization in the Cloud @ Build a Cloud Day SFO May 2012
Virtualization in the Cloud @ Build a Cloud Day SFO May 2012Virtualization in the Cloud @ Build a Cloud Day SFO May 2012
Virtualization in the Cloud @ Build a Cloud Day SFO May 2012The Linux Foundation
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbaseRavi Veeramachaneni
 
Open solaris customer presentation
Open solaris customer presentationOpen solaris customer presentation
Open solaris customer presentationxKinAnx
 

Tendances (20)

Gluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFS
 
Cloud Storage Adoption, Practice, and Deployment
Cloud Storage Adoption, Practice, and DeploymentCloud Storage Adoption, Practice, and Deployment
Cloud Storage Adoption, Practice, and Deployment
 
Google Compute and MapR
Google Compute and MapRGoogle Compute and MapR
Google Compute and MapR
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmine
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
cosbench-openstack.pdf
cosbench-openstack.pdfcosbench-openstack.pdf
cosbench-openstack.pdf
 
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
 
Cosbench apac
Cosbench apacCosbench apac
Cosbench apac
 
CloudStack Architecture Future
CloudStack Architecture FutureCloudStack Architecture Future
CloudStack Architecture Future
 
CloudStack-Developer-Day
CloudStack-Developer-DayCloudStack-Developer-Day
CloudStack-Developer-Day
 
Presentation introduction to cloud computing and technical issues
Presentation   introduction to cloud computing and technical issuesPresentation   introduction to cloud computing and technical issues
Presentation introduction to cloud computing and technical issues
 
Using Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data AnalysisUsing Distributed In-Memory Computing for Fast Data Analysis
Using Distributed In-Memory Computing for Fast Data Analysis
 
Postgres Plus Cloud Database
Postgres Plus Cloud DatabasePostgres Plus Cloud Database
Postgres Plus Cloud Database
 
Vizuri Exadata East Coast Users Conference
Vizuri Exadata East Coast Users ConferenceVizuri Exadata East Coast Users Conference
Vizuri Exadata East Coast Users Conference
 
Apache CloudStack Architecture by Alex Huang
Apache CloudStack Architecture by Alex HuangApache CloudStack Architecture by Alex Huang
Apache CloudStack Architecture by Alex Huang
 
Management server internals
Management server internalsManagement server internals
Management server internals
 
Virtualization in the Cloud @ Build a Cloud Day SFO May 2012
Virtualization in the Cloud @ Build a Cloud Day SFO May 2012Virtualization in the Cloud @ Build a Cloud Day SFO May 2012
Virtualization in the Cloud @ Build a Cloud Day SFO May 2012
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Open solaris customer presentation
Open solaris customer presentationOpen solaris customer presentation
Open solaris customer presentation
 

En vedette

Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopHortonworks
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? DataWorks Summit
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopDataWorks Summit
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java DevelopersRichard McDougall
 
Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Richard McDougall
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data ApplicationsRichard McDougall
 
Virtualizing Oracle Databases with VMware
Virtualizing Oracle Databases with VMwareVirtualizing Oracle Databases with VMware
Virtualizing Oracle Databases with VMwareRichard McDougall
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshootingglbsolutions
 
Denver VMUG nov 2011
Denver VMUG nov 2011Denver VMUG nov 2011
Denver VMUG nov 2011Dan Brinkmann
 
Citrix Remote Access Solution Soup
Citrix Remote Access Solution SoupCitrix Remote Access Solution Soup
Citrix Remote Access Solution SoupDan Brinkmann
 
VMware vSphere Performance Troubleshooting
VMware vSphere Performance TroubleshootingVMware vSphere Performance Troubleshooting
VMware vSphere Performance TroubleshootingDan Brinkmann
 
VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5Vepsun Technologies
 
VMware Advance Troubleshooting Workshop - Day 2
VMware Advance Troubleshooting Workshop - Day 2VMware Advance Troubleshooting Workshop - Day 2
VMware Advance Troubleshooting Workshop - Day 2Vepsun Technologies
 
VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3Vepsun Technologies
 
VMware Advance Troubleshooting Workshop - Day 4
VMware Advance Troubleshooting Workshop - Day 4VMware Advance Troubleshooting Workshop - Day 4
VMware Advance Troubleshooting Workshop - Day 4Vepsun Technologies
 
VMware Advance Troubleshooting Workshop - Day 6
VMware Advance Troubleshooting Workshop - Day 6VMware Advance Troubleshooting Workshop - Day 6
VMware Advance Troubleshooting Workshop - Day 6Vepsun Technologies
 

En vedette (20)

Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
 
Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud? Where to Deploy Hadoop: Bare Metal or Cloud?
Where to Deploy Hadoop: Bare Metal or Cloud?
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing Hadoop
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Making of the Burner Board
Making of the Burner BoardMaking of the Burner Board
Making of the Burner Board
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java Developers
 
Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
Virtualizing Oracle Databases with VMware
Virtualizing Oracle Databases with VMwareVirtualizing Oracle Databases with VMware
Virtualizing Oracle Databases with VMware
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshooting
 
Denver VMUG nov 2011
Denver VMUG nov 2011Denver VMUG nov 2011
Denver VMUG nov 2011
 
Citrix Remote Access Solution Soup
Citrix Remote Access Solution SoupCitrix Remote Access Solution Soup
Citrix Remote Access Solution Soup
 
VMware vSphere Performance Troubleshooting
VMware vSphere Performance TroubleshootingVMware vSphere Performance Troubleshooting
VMware vSphere Performance Troubleshooting
 
VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5
 
VMware Advance Troubleshooting Workshop - Day 2
VMware Advance Troubleshooting Workshop - Day 2VMware Advance Troubleshooting Workshop - Day 2
VMware Advance Troubleshooting Workshop - Day 2
 
VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3
 
VMware Advance Troubleshooting Workshop - Day 4
VMware Advance Troubleshooting Workshop - Day 4VMware Advance Troubleshooting Workshop - Day 4
VMware Advance Troubleshooting Workshop - Day 4
 
VMware Advance Troubleshooting Workshop - Day 6
VMware Advance Troubleshooting Workshop - Day 6VMware Advance Troubleshooting Workshop - Day 6
VMware Advance Troubleshooting Workshop - Day 6
 

Similaire à Hadoop on Virtual Machines

Distributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdDistributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdSATOSHI TAGOMORI
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudCloudera, Inc.
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache AccumuloJared Winick
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloudaidanshribman
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkSteve Loughran
 
Windows server 2012 failover clustering improvements
Windows server 2012   failover clustering improvementsWindows server 2012   failover clustering improvements
Windows server 2012 failover clustering improvementsSusantha Silva
 
21.10.09 Microsoft Event, Microsoft Presentation
21.10.09 Microsoft Event, Microsoft Presentation21.10.09 Microsoft Event, Microsoft Presentation
21.10.09 Microsoft Event, Microsoft Presentationdataplex systems limited
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott
 
Architecting data center networks in the era of big data and cloud
Architecting data center networks in the era of big data and cloudArchitecting data center networks in the era of big data and cloud
Architecting data center networks in the era of big data and cloudbradhedlund
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackMirantis
 
Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataData Con LA
 
Microsoft dagen windows server 2012
Microsoft dagen   windows server 2012Microsoft dagen   windows server 2012
Microsoft dagen windows server 2012Olav Tvedt
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese天青 王
 
4. v sphere big data extensions hadoop
4. v sphere big data extensions   hadoop4. v sphere big data extensions   hadoop
4. v sphere big data extensions hadoopChiou-Nan Chen
 
Deploying Baremetal Instances with OpenStack
Deploying Baremetal Instances with OpenStackDeploying Baremetal Instances with OpenStack
Deploying Baremetal Instances with OpenStackEtsuji Nakai
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Benoit Hudzia
 

Similaire à Hadoop on Virtual Machines (20)

Distributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentdDistributed Stream Processing on Fluentd / #fluentd
Distributed Stream Processing on Fluentd / #fluentd
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in Cloud
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
An Introduction to Azure IaaS
An Introduction to Azure IaaSAn Introduction to Azure IaaS
An Introduction to Azure IaaS
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talk
 
Windows server 2012 failover clustering improvements
Windows server 2012   failover clustering improvementsWindows server 2012   failover clustering improvements
Windows server 2012 failover clustering improvements
 
21.10.09 Microsoft Event, Microsoft Presentation
21.10.09 Microsoft Event, Microsoft Presentation21.10.09 Microsoft Event, Microsoft Presentation
21.10.09 Microsoft Event, Microsoft Presentation
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
Architecting data center networks in the era of big data and cloud
Architecting data center networks in the era of big data and cloudArchitecting data center networks in the era of big data and cloud
Architecting data center networks in the era of big data and cloud
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStack
 
Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueData
 
Microsoft dagen windows server 2012
Microsoft dagen   windows server 2012Microsoft dagen   windows server 2012
Microsoft dagen windows server 2012
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese
 
4. v sphere big data extensions hadoop
4. v sphere big data extensions   hadoop4. v sphere big data extensions   hadoop
4. v sphere big data extensions hadoop
 
Deploying Baremetal Instances with OpenStack
Deploying Baremetal Instances with OpenStackDeploying Baremetal Instances with OpenStack
Deploying Baremetal Instances with OpenStack
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
 
Windows Azure
Windows AzureWindows Azure
Windows Azure
 

Dernier

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Dernier (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Hadoop on Virtual Machines

  • 1. Hadoop in Virtual Machines Richard McDougall, VMware Sanjay Radia, Hortonworks Hadoop Summit, 2012
  • 3. Say What? • VMs will just add overhead, due to I/O virt • VMs run on SAN, we’re all about local disks • Hadoop does it’s own cluster management • It’ll do resource management in 2.0 • And even HA is coming to Hadoop • And… what is the point, anyway?
  • 4. But you’ve been asking… • Can I virtualize my Hadoop, so that I can make it easier, quicker to get a cluster up and running • Is it possible to run Hadoop on those spare machine cycles I have on hundreds/thousands of nodes? • Can I make my system more available by using some of the standard HA features?
  • 5. And the savvy are asking… • Can I avoid having to install special hardware for the master services, like name-node, job- tracker? • Can I dynamically change the size of the cluster to use more resources? • Can I use VM isolation to increase security or guard against resource-intensive neighbors? • Is it feasible to provision virtual-clusters, giving out one each to a business unit?
  • 6. Ok, so first what about the concerns? • Use your SAN? … if you want to. SAN Storage NAS Filers Local Storage $2 - $10/Gigabyte $1 - $5/Gigabyte $0.05/Gigabyte $1M gets: $1M gets: $1M gets: 0.5Petabytes 1 Petabyte 20 Petabytes 1,000,000 IOPS 400,000 IOPS 10,000,000 IOPS 1Gbyte/sec 2Gbyte/sec 800 Gbytes/sec
  • 7. Hadoop Using Local Disks Task Tracker Datanode Other Hadoop Workload Virtual Machine Ext4 Ext4 Ext4 Virtualization Host OS Image - VMDK VMDK VMDK VMDK Shared Storage
  • 8. Hadoop Perf in a VM (Ratio is elapsed time to physical, Lower Is Better) 1.2 1 Ratio to Native 0.8 0.6 0.4 1 VM 2 VMs 0.2 0
  • 9. Evolution of Hadoop on VMs VM VM VM VM Current Hadoop: Compute T1 T2 Combined VM VM Storage/Co Storage Storage mpute Hadoop in VM Separate Storage Separate Compute Clusters - VM lifecycle - Separate compute - Separate virtual clusters determined from data per tenant by Datanode - Elastic compute - Stronger VM-grade security - NOT Elastic - Enable shared and resource isolation - Limited to Hadoop workloads - Enable deployment of Multi-Tenancy - Raise utilization multiple Hadoop runtime versions
  • 10. 1. Hadoop Task Tracker and Data Node in a VM Add/Remove Slot Slots? Slot Other Virtual Task Tracker Hadoop Workload Node Datanode Grow/Shrink by tens of GB? Virtualization Host VMDK Grow/Shrink of a VM is one approach
  • 11. 2. Add/remove Virtual Nodes Slot Slot Slot Slot Other Virtual Task Tracker Virtual Task Tracker Hadoop Hadoop Workload Node Node Datanode Datanode Virtualization Host VMDK VMDK Just add/remove more virtual nodes?
  • 12. But State makes it hard to power-off a node Slot Slot Other Virtual Task Tracker Hadoop Workload Node Datanode Virtualization Host VMDK Powering off the Hadoop VM would in effect fail the datanode
  • 13. Adding a node needs data… Slot Slot Slot Slot Other Virtual Task Tracker Virtual Task Tracker Hadoop Hadoop Workload Node Node Datanode Datanode Virtualization Host VMDK VMDK Adding a node would require TBs of data replication
  • 14. 2. Separated Compute and Data Slot Slot Virtual Slot Virtual Hadoop Slot Virtual Slot Virtual Hadoop Slot Hadoop Node Hadoop Node Node Node Task Tracker Other Task Tracker Task Tracker Workload Virtual Hadoop Datanode Node Virtualization Host VMDK VMDK Truly Elastic Hadoop: Scalable through virtual nodes
  • 15. Dataflow with separated Compute/Data Slot Virtual Slot Virtual Hadoop Hadoop Node Node Datanode NodeManager Virtual NIC Virtual NIC Virtualization Host Virtual Switch VMDK NIC Drivers
  • 16. Performance Analysis of Split 1 Combined Compute/Datanode VM per Host 1 Datanode VM, 1 Compute nodes VM per Host Node Node Node Node Manager Manager Manager Manager Datanode Datanode Datanode Datanode Workload: Teragen, Terasort, Teravalidate HW Configuration: 8 cores, 96GB RAM, 16 disks per host x 2 nodes
  • 17. Performance Analysis of Split (Elapsed time: ratio to combined) 1.2 1 0.8 0.6 Combined Split 0.4 0.2 0 Teragen Terasort Teravalidate
  • 18. Tying it together: Elastic Hadoop Coke Pepsi Hadoop Hadoop Hadoop Hadoop Queue Virtual Virtual Virtual Virtual Runtime Layer Data Layer Namespace Namespace Namespace Distributed File System (HDFS, KFS, GPFS, MAPR, Isilon,…) Host Host Host Host Host Host
  • 20. Demo: Shrink/Expand Cluster Setup 1 Datanodes, 2 Nodemanagers and 2 web servers on each physical host Web Server Web Server Web Server Web Server Web Server Web Server Web Server Web Server NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager Datanode Datanode Datanode Datanode
  • 21. Demo: Shrink/Expand Cluster When web load is high in daytime, we can suspend some Nodemanagers and power on more Web servers. Web Server Web Server Web Server Web Server Web Server Web Server Web Server Web Server NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager Datanode Datanode Datanode Datanode
  • 22. Demo
  • 24. Expand Hadoop Ecosystem • Hortonworks goal – Expand Hadoop ecosystem – Provide first class support of various platforms • Hadoop should run well on VMs • VMs offer several advantages as presented earlier • Take advantage of vSphere for HA Page 25
  • 25. VMware-Hortonworks Joint Engineering • First class support for VMs – Topology plugins (Hadoop-8468) • 2 VMs can be on same host – Pick closer data – Schedule tasks closer – Don’t put two replicas on same host – MR-tmp on HDFS using block pools • Elastic Compute-VMs will not need local disk – Fast communications within VMs Page 26
  • 26. Hadoop Total System Availability Architecture Slave Nodes of Hadoop Cluster job job job job job Apps Running Outside Failover JT into Safemode NN JT NN N+K Server Server Server failover HA Cluster for Master Daemons 27
  • 27. HA is coming in 1.0 Using Total System Availability Architecture 28 © Hortonworks Inc. 2011
  • 28. HA in Hadoop 1 with HDP1 • Total System Availability Architecture – Namenode • Clients pause automatically • JobTracker pauses automatically – Other Hadoop master services (JT, …) coming • Use industry proven HA framework – VMWare vSphere-HA • Failover, fencing, … • Corner cases are tricky – if not addressed, corruption – Addition benefits: • N-N & N+K failover • Migration for maintenance 29
  • 29. Hadoop NN/JT HA with vSphere Page 30
  • 30. NameNode HA – Failover Times • NameNode Failover times with vSphere and LinuxHA – Failure detection + Failover – 0.5 to 2 minutes – OS bootup needed for vSphere – 10-20 seconds – Namenode Startup (exit safemode) • Small/Medium clusters – 1 to 2 minutes • Large cluster – 5 to 15 minutes • NameNode startup time measurements – 60 Nodes, 60K files, 6 million blocks, 300 TB raw storage – 40 sec – 180 Nodes, 200K files, 18 million blocks, 900TB raw storage – 120 sec Cold Failover is good enough for small/medium clusters Failure Detection and Automatic Failover Dominates 31
  • 31. Demo
  • 32. Summary • Advantages of Hadoop on VMs – Cluster Management – Cluster consolidation – Greater Elasticity in mixed environment – Alternate multi-tenancy to capacity scheduler’s offerings • HA for Hadoop Master Daemons – vSphere based HA for NN, JT, … in Hadoop 1 – Total System Availability Architecture Page 33

Notes de l'éditeur

  1. Hybrid StorageLocal Disks, retains fault domains of individual disks
  2. Data – can I read what I wrote, is the service availableWhen I asked one of the original authors of of GFS if there were any decisions they would revist – random writersSimplicity is keyRaw disk – fs take time to stabilize – we can take advantage of ext4, xfs or zfs