SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
A Dell EMC Best Practices Guide
Dell EMC PowerEdge and BlueData
Enabling Big-Data-as-a-Service and Exploratory Analytics on Dell EMC PowerEdge
Compute Platforms, Ready Solutions, and Isilon Storage
Kris Applegate (kris.applegate@dell.com)
Dell EMC Customer Solution Center
October 2017
2 Dell EMC PowerEdge and BlueData | version (2.0)
Revisions
Date Description
February 2016 1.0 Initial release
October 2017 2.0 Updated release
 Updated to the new Dell EMC brand.
 Incorporated new 14th Generation of Dell EMC PowerEdge Servers
 Added configurations for existing Hadoop and Isilon
 Added design accommodations for new features in EPIC 3.0
The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this
publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.
Use, copying, and distribution of any software described in this publication requires an applicable software license.
Copyright © 2017 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other
trademarks may be the property of their respective owners. Published in the USA [10/2/2017] [Best Practices Guide]
Dell believes the information in this document is accurate as of its publication date. The information is subject to change without notice.
3 Dell EMC PowerEdge and BlueData | version (2.0)
Table of contents
Revisions.............................................................................................................................................................................2
Executive Summary............................................................................................................................................................5
1 Introduction...................................................................................................................................................................6
2 BlueData Epic Server Roles.........................................................................................................................................8
2.1 Controllers ..........................................................................................................................................................8
2.2 Workers ..............................................................................................................................................................8
2.3 Gateways............................................................................................................................................................8
3 Host Storage Types......................................................................................................................................................9
3.1 Operation System (OS) ......................................................................................................................................9
3.2 Node Storage......................................................................................................................................................9
3.3 Data Storage.......................................................................................................................................................9
4 Recommended Deployment Models ..........................................................................................................................11
4.1 Local Storage....................................................................................................................................................11
4.2 Remote Storage on Dell EMC Hadoop Ready Bundle.....................................................................................13
4.3 Remote Storage on Dell EMC Isilon.................................................................................................................14
5 Component Builds ......................................................................................................................................................15
5.1 Recommended Host Counts ............................................................................................................................15
5.2 Controller/Worker - Local Storage (High Capacity) ..........................................................................................15
5.3 Controller/Worker - Local Storage (High Performance) ...................................................................................16
5.4 Controller/Worker – Existing Ready Bundle / Isilon..........................................................................................17
5.5 Gateways (Optional).........................................................................................................................................17
5.6 Management Switches .....................................................................................................................................18
5.7 Data Switches...................................................................................................................................................18
6 Additional Configuration Considerations ....................................................................................................................19
6.1 Host Count........................................................................................................................................................19
6.2 Node Storage....................................................................................................................................................19
6.3 Data Storage and Platform Type......................................................................................................................19
6.4 Host Networking ...............................................................................................................................................19
6.5 Ready Solution Speed / Capacity.....................................................................................................................20
6.6 Isilon Model.......................................................................................................................................................20
7 Conclusion..................................................................................................................................................................21
A Tested Configuration Details......................................................................................................................................22
4 Dell EMC PowerEdge and BlueData | version (2.0)
B Dell EMC Customer Solution Centers........................................................................................................................24
C Technical support and resources ...............................................................................................................................25
C.1 Related resources.............................................................................................................................................25
5 Dell EMC PowerEdge and BlueData | version (2.0)
Executive Summary
The demand for businesses to make data-driven decisions has become no longer optional, but rather
required. The sheer number of choices in data processing platforms, both for structured and unstructured
data are overwhelming. Knowledge workers in customer’s lines of business are requiring IT organizations to
rapidly architect, provision, and maintain multiple sets of tools. Failure of an IT team to meet this demand
could see users trying to stand-up their own “unapproved / unmanaged” clusters or, at a greater risk, move
data out in un-governed public clouds where company intellectual property and customer data could be at
risk.
The benefit of partnering BlueData’s EPIC software with Dell EMC’s powerful portfolio of compute, storage,
and networking options allows our customers to consolidate multiple data analytics clusters into one shared
platform. This results in simplification of infrastructure, comprehensive control of data, and flexible and agile
capabilities around spinning up new and emerging technologies easily. This solution allows IT to focus on
helping their company differentiate themselves at the workload layer, rather than spending time at the
infrastructure plumbing layer.
6 Dell EMC PowerEdge and BlueData | version (2.0)
1 Introduction
The demand for businesses to make data-driven decisions has become no longer optional, but rather required. The sheer
number of choices in data processing platforms, both for structured and unstructured data are overwhelming. Knowledge
workers in customer’s lines of business are requiring IT organizations to rapidly architect, provision, and maintain multiple
sets of tools. Failure of an IT team to meet this demand could see users trying to stand-up their own “unapproved /
unmanaged” clusters or, at a greater risk, move data out in un-governed public clouds where company intellectual
property and customer data could be at risk.
Current Data Analytics Cluster State
The benefit of partnering BlueData’s EPIC software with Dell EMC’s powerful portfolio of compute, storage, and
networking options allows our customers to consolidate multiple data analytics clusters into one shared platform. This
results in simplification of infrastructure, comprehensive control of data, and flexible and agile capabilities around spinning
up new and emerging technologies easily. This solution allows IT to focus on helping their company differentiate
themselves at the workload layer, rather than spending time at the infrastructure plumbing layer.
7 Dell EMC PowerEdge and BlueData | version (2.0)
BlueData EPIC
8 Dell EMC PowerEdge and BlueData | version (2.0)
2 BlueData Epic Server Roles
Below we describe the major server roles that are needed to stand up a production BlueData cluster. Updated sizing
guidance can always be found in the most recent documentation from BlueData.
2.1 Controllers
These hosts are the initial installation point of BlueData and the key orchestrators in instantiating the containerized
clusters on the Worker hosts. While a single host can act as the only controller (as well as a Worker host), we recommend
that you deploy 3 Controller hosts as a master, standby, and arbiter highly available configuration. By default all
controllers are provisioned as Workers as well, though some customer’s requirements may dictate that container-
provisioning be disabled on the controller hosts in order to dedicate them to orchestration only. These hosts can also
serve as network routers between the containerized internal overlay network and the outside world.
2.2 Workers
Workers form the backbone of the compute infrastructure as well as provide the disk space on which the containerized
hosts spin-up their scratch/ephemeral storage. In the case of the Local Storage configuration these Worker hosts will also
hold the disks that will be dedicated to providing the Data Storage capacity. You’ll want a platform here that has multiple
disk options of either large form-factor (LFF) disks that are suited to high-capacity or small form-factor (SFF) disks that
emphasize performance. CPU cores and memory should be sized based off of the anticipated number of containers each
host should be expected to handle in addition to 1 core and 1/8 allocated RAM +1GB for overhead. It’s important to
understand that CPU cores can be oversubscribed, but memory can’t be.
2.3 Gateways
Gateway hosts are optional components that provide the ability to surface network end-points for the internal
containerized clusters. This allows designs to forgo the need for a routable container network through the controllers.
Containerized clusters will terminate their user interface and service end-points on the front-facing IPs of the gateway
hosts, mapped to ports above 10000. These hosts can be very light-weight configurations as well as even virtual
machines on existing compute infrastructure. In the case of the Hadoop Ready Bundle solutions, you could deploy these
on the existing Hadoop Edge Nodes.
9 Dell EMC PowerEdge and BlueData | version (2.0)
3 Host Storage Types
Each host in a BlueData cluster contains a couple of key types of local and/or remote storage. Some of these storage
types are optionally remote (Data Storage) and others are provisioned from local disk resources (Operating System and
Node Storage).
3.1 Operation System (OS)
This is the storage on which the host-operating system is installed. All 3 roles (Controllers, Workers, and Gateways) have
local Operating Systems that will need to be installed prior to BlueData package setup. We recommend, as with any
production deployment, that you mirror the OS using RAID for resiliency. Check BlueData’s software requirements for the
latest supported Linux OS revisions. Note that currently, the minimum amount of storage for the OS volume is 300GB,
which can prevent the use of some OS/boot storage options like the Dell EMC Boot Optimized Storage Subsystem
(BOSS) without manually overriding the installer.
3.2 Node Storage
Node storage is the temporary/ephemeral/scratch space on each Worker host where the containers are instantiated. This
is high-speed storage that is local to each Worker host. Regardless of deployment model, Node storage space is required.
This must be un-formatted volumes that are presented to the Linux OS (e.g. /dev/sda, /dev/vda, /dev/nvme0n1). It’s
important to remember that you must present each drive individually as opposed to in aggregate (RAID). In Dell
PowerEdge RAID Controllers (PERC) this means flagging all the drives as “Non-RAID”. Detailed sizing considerations can
be referred to in the BlueData System Architecture section of the documentation.
3.3 Data Storage
The storage that each of the tenants use to store source data and results is referred to as Data storage. This can be made
up of multiple local drives or of remote storage on an existing Hadoop (HDFS or Kerberos HDFS) and NAS (NFS). If using
local storage, these drives must also be Non-RAID and unformatted. When using remote storage, you’ll use the BlueData
DataTap mechanism to transparently attach over IP.
10 Dell EMC PowerEdge and BlueData | version (2.0)
Below is an example of how you can carve up the local drives on a Dell EMC PowerEdge R740XD to provide all 3 of the
storage roles needed for BlueData. This just serves and one example of hardware configurations, further examples for
additional use-cases and performance characteristics are detailed in Sections 4 and 5.
Example Disk Layout for Controller/Worker - Local Storage (High Performance)
GB
SAS
12Gb
15k
GB
SAS
12Gb
15k
GB
SATA
6Gb
SSD
GB
SATA
6Gb
SSD
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
Drive28
Drive29
Drive30
Drive31
Drive24
Drive25
Drive26
Drive27
Drive0
Drive1
Drive2
Drive3
Drive4
Drive5
Drive6
Drive7
Drive8
Drive9
Drive10
Drive11
Drive12
Drive13
Drive14
Drive15
Drive16
Drive17
Drive18
Drive19
Drive20
Drive21
Drive22
Drive23
Front Hot-Plug
Mid-Bay Flexbay
Data
Node
OS
11 Dell EMC PowerEdge and BlueData | version (2.0)
4 Recommended Deployment Models
The flexibility of the BlueData platform really shines when as you begin to look at the possible deployment models. Below
you’ll see 3 recommended models ranging from all local storage through to remote storage provided by either existing
Hadoop clusters or Dell EMC Isilon’s scale-out NAS capabilities. Which model you choose should be determined after
consultation with your Dell EMC and BlueData sales support resources (as well as the Dell EMC Customer Solutions
Center)
4.1 Local Storage
In this model, each Worker and Controller/Worker has local storage for all three storage types: OS, Node, and Data
resulting in a more simplified architecture. The key differentiator in this deployment model is the local storage is allocated
as Tenant data. Local disks (usually rotational) are used to make up the BlueData HDFS instance that is shared across all
Tenants. Each tenant is restricted from viewing the other tenant’s files, but they are all resident on the single HDFS
namespace that BlueData provides. Different server models can be used here to provide a platform that is optimized for
either high-capacity (LFF) or high performance (SFF). However, we do recommend that all Worker hosts have the same
disk configuration and layout.
Dell EMC PowerEdge R740XD with Large Form-factor Drives (Top) and Small Form-factor Drives
(Bottom) – Bezels Removed
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
GB
SAS
12Gb
15k
GB
SAS
12Gb
15k
GB
SATA
6Gb
SSD
GB
SATA
6Gb
SSD
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
12 Dell EMC PowerEdge and BlueData | version (2.0)
Local Storage Deployment Model
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
StackIDStackID
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
OS Node Data
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
Controllers / Workers
Workers
(Master, Standby, Arbiter)
Data Switches
27 29 31 33 35
28 30 32 34 36
37 39 41 43 45 47
38 40 42 44 46 48
LNK ACT7 9 11
2 4 6 8 10 12
13 15 17 19 21 23
14 16 18 20 22 24
25
26
Stack No.
1
2
49 50SFP+
LNK ACT
5
COMBO PORTS 47 48
27 29 31 33 35
28 30 32 34 36
37 39 41 43 45 47
38 40 42 44 46 48
LNK ACT7 9 11
2 4 6 8 10 12
13 15 17 19 21 23
14 16 18 20 22 24
25
26
Stack No.
1
2
49 50SFP+
LNK ACT
5
COMBO PORTS 47 48
Management Switches
iDRACs
Gateway(s)
(Optional)
Corporate
Network
13 Dell EMC PowerEdge and BlueData | version (2.0)
4.2 Remote Storage on Dell EMC Hadoop Ready Bundle
Many customers have existing, robust, hardened primary Hadoop clusters that they would like to leverage for storage. In
this deployment model you can use BlueData as the Controller/Compute tier and leave your source data and result data
on the remote HDFS solution. Dell EMC has a long history of providing validated reference configurations of both
Cloudera and Hortonworks distributions of Hadoop. These solutions can save IT departments hours of valuable time by
taking care of all the steps of racking, provisioning the operating systems, and installing and configuration the Hadoop
distribution, allowing IT to spend time focused on the workloads that differentiate their company.
Remote Storage on Dell EMC Ready Bundle Deployment Model
StackIDStackID
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
OS Node
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
Workers
(Master, Standby, Arbiter)
Data Switches
27 29 31 33 35
28 30 32 34 36
37 39 41 43 45 47
38 40 42 44 46 48
LNK ACT7 9 11
2 4 6 8 10 12
13 15 17 19 21 23
14 16 18 20 22 24
25
26
Stack No.
1
2
49 50SFP+
LNK ACT
5
COMBO PORTS 47 48
27 29 31 33 35
28 30 32 34 36
37 39 41 43 45 47
38 40 42 44 46 48
LNK ACT7 9 11
2 4 6 8 10 12
13 15 17 19 21 23
14 16 18 20 22 24
25
26
Stack No.
1
2
49 50SFP+
LNK ACT
5
COMBO PORTS 47 48
Management Switches
Data
Storage
Name Node(s)
iDRACs
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
Gateway(s)
(Optional)
Controllers / Workers
Corporate
Network
14 Dell EMC PowerEdge and BlueData | version (2.0)
4.3 Remote Storage on Dell EMC Isilon
Dell EMC Isilon’s multi-protocol scale-out storage capabilities can be leveraged to provide a remote filesystem with a
significantly lower total cost of ownership (TCO). By reducing the need for multiple software replicas and sharing the same
data across SMB/NFS/HDFS Isilon can reduce the complexity of a remote storage solution while allowing the introduction
of a robust portfolio of data management options.
Remote Storage on Dell EMC Isilon Deployment Model
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
StackIDStackID
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
OS Node
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
Workers
(Master, Standby, Arbiter)
Data Switches
27 29 31 33 35
28 30 32 34 36
37 39 41 43 45 47
38 40 42 44 46 48
LNK ACT7 9 11
2 4 6 8 10 12
13 15 17 19 21 23
14 16 18 20 22 24
25
26
Stack No.
1
2
49 50SFP+
LNK ACT
5
COMBO PORTS 47 48
27 29 31 33 35
28 30 32 34 36
37 39 41 43 45 47
38 40 42 44 46 48
LNK ACT7 9 11
2 4 6 8 10 12
13 15 17 19 21 23
14 16 18 20 22 24
25
26
Stack No.
1
2
49 50SFP+
LNK ACT
5
COMBO PORTS 47 48
Management Switches
Data
Storage
iDRACs
Gateway(s)
(Optional)
Controllers / Workers
Corporate
Network
15 Dell EMC PowerEdge and BlueData | version (2.0)
5 Component Builds
5.1 Recommended Host Counts
Components Local Storage
Remote Storage on
Dell EMC Hadoop
Ready Bundle
Remote Storage on
Dell EMC Isilon
Controller/Worker Hosts
(with local data storage)
3 (High Capacity or
Performance)
N/A N/A
Worker Only Hosts
(with local data storage)
7 (High Capacity or
Performance)
N/A N/A
Controller/Worker Hosts
(without local data storage)
N/A 3 3
Worker Only Hosts
(without local data storage)
N/A 7 7
Gateways 2 2 2
Management Switches 2 2 2
Data Switches 2 2 2
External Storage Solution N/A Dell EMC Ready Bundle
for Hadoop
Dell EMC Isilon Solution
Usable Node Storage 7.68 TB (SSD) 7.68 TB (SSD) 7.68 TB (SSD)
Usable Data Storage 21.3 TB (Capacity)
11.2 TB (Performance)
Varies Depending on
Existing Hadoop’s
Capacity
Varies Depending on
Isilon Configuration
Recommended Host Counts
5.2 Controller/Worker - Local Storage (High Capacity)
Component
Server Model Dell EMC PowerEdge R740XD
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
16 Dell EMC PowerEdge and BlueData | version (2.0)
Chassis Chassis with up to 12x3.5" HDD, 4x3.5" in Mid-bay, and 4x2.5" in Flexbay
Processor(s) Dual Intel® Xeon® Gold 6136 3.0G, 12C/24T, 24.75M Cache
Memory 384 GB RAM (12x 32GB DIMMs)
Disk Controller Dell PERC H730P
OS Storage 2x 2.5” 600GB 10K SAS Drives (Flexbay) – RAID 1 (Mirror)
Node Storage 2x 2.5” 3.84TB Mixed Use SAS SSD Drives (Flexbay) – Non-RAID
Data Storage 16x 3.5” 4TB 7.2K SATA (12x Front and 4x Mid-bay) – Non-RAID
Network Card Intel X710 Quad Port 10GbE Network Daughter Card
Controller/Worker - Local Storage (High Capacity)
5.3 Controller/Worker - Local Storage (High Performance)
Component
Server Model Dell EMC PowerEdge R740XD
Chassis Chassis with up to 24x2.5" HDD, 4x2.5" in Mid-bay, and 4x2.5" in Flexbay
Processor(s) Dual Intel® Xeon® Gold 6136 3.0G, 12C/24T, 24.75M Cache
Memory 384 GB RAM (12x 32GB DIMMs)
Disk Controller Dell PERC H740P
OS Storage 2x 2.5” 600GB 10K SAS Drives (Flexbay) – RAID 1 (Mirror)
Node Storage 2x 2.5” 3.84TB Mixed Use SAS SSD Drives (Flexbay) – Non-RAID
Data Storage 28x 2.5” 1.8TB 10K SAS (24x Front and 4x Mid-bay) – Non-RAID
Network Card Intel X710 Quad Port 10GbE Network Daughter Card
Controller/Worker - Local Storage (High Performance)
GB
SAS
12Gb
15k
GB
SAS
12Gb
15k
GB
SATA
6Gb
SSD
GB
SATA
6Gb
SSD
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
TB
SATA
6Gb
7.2k
17 Dell EMC PowerEdge and BlueData | version (2.0)
5.4 Controller/Worker – Existing Ready Bundle / Isilon
Component
Server Model Dell EMC PowerEdge R740XD
Chassis Chassis with up to 24x2.5" HDD
Processor(s) Dual Intel® Xeon® Gold 6136 3.0G, 12C/24T, 24.75M Cache
Memory 384 GB RAM (12x 32GB DIMMs)
Disk Controller Dell PERC H740P
OS Storage 2x 2.5” 600GB 10K SAS Drives – RAID 1 (Mirror)
Node Storage 2x 2.5” 3.84TB Mixed Use SAS SSD Drives – Non-RAID
Data Storage 1x Dell EMC Isilon Generation 6 Chassis w/
 4x Dell EMC Isilon H500 each w/
o 15x 4TB SATA Drives
o 2x 1.6TB SSD
Network Card Intel X710 Quad Port 10GbE Network Daughter Card
Controller/Worker – Existing Ready Bundle / Isilon
5.5 Gateways (Optional)
The optional gateway hosts can be a very light-weight server whose configuration is highly dependent on the customer
use-case. A non-specific configuration is listed below that details the recommended CPU core count and RAM amounts.
Component
CPU Core Count 8-Cores
Memory 32GB
OS Storage 100 GB
Gateways (Optional)
GB
SAS
12Gb
15k
GB
SAS
12Gb
15k
GB
SATA
6Gb
SSD
GB
SATA
6Gb
SSD
18 Dell EMC PowerEdge and BlueData | version (2.0)
5.6 Management Switches
Component
Switch Model Dell EMC Networking S3148-ON
Management Switches
5.7 Data Switches
Component
Switch Model Dell EMC Networking S4148F-ON
Data Switches
27 29 31 33 35
28 30 32 34 36
37 39 41 43 45 47
38 40 42 44 46 48
LNK ACT7 9 11
2 4 6 8 10 12
13 15 17 19 21 23
14 16 18 20 22 24
25
26
Stack No.
1
2
49 50SFP+
LNK ACT
5
COMBO PORTS 47 48
StackID
19 Dell EMC PowerEdge and BlueData | version (2.0)
6 Additional Configuration Considerations
6.1 Host Count
This document outlines 3 configurations that allow for 3 different methods for storing persistent data. The amount of
compute resources in each configuration have been kept the same (processor core count and RAM) so that each
configuration can instantiate the same number of containerized applications. However, it should be noted that in the
remote storage configurations (Remote Hadoop and Dell EMC Isilon), since the local storage processing overhead of
offloaded to those storage systems, the amount of compute and memory needed could diminish. With that in mind, you
could see the same workload performance in those configurations as with the local storage configuration, all while
allowing for a lower total host count.
6.2 Node Storage
The drives providing Node storage are the best place to make an investment in speed. This can range from faster / larger
SSDs on through PCI-E NVMe devices with microsecond latency. Improving the speed here will improve the speed of
provisioning containers, the speed at which those containers can execute storage operations, and reduce the latency of
transactions. The sizing of this space will need to take into account how many containers of each solution stack will need
to be deployed on each host. The advantage of using containerization here is that you’re not penalized for the OS portion
of each host since they are shared amongst all other hosts that use that same container’s OS. This means that the only
thing that distinguishes each container from each other are the deltas to the base image.
6.3 Data Storage and Platform Type
The amount of necessary data storage will vary depending on customer use-case. The Dell EMC PowerEdge R740XD
compute platform allows for multiple configurations across both the performance and capacity dimensions. However,
there are a number of additional optional platforms you could consider:
Controller/Worker –Local Storage
> Cost-Optimized Local Storage–Dell EMC PowerEdge R540
> Capacity-Optimized Local Storage – Dell EMC R740XD with larger drives (limit to less than
100GB raw per host as a best practice)
> Density-Optimized Local Storage– Dell EMC PowerEdge R640
> Modular Local Storage–Dell EMC PowerEdge FX2 with Dell EMC PowerEdge FC640 and
Dell EMC PowerEdge FD332 Controller/Worker
> Cost-Optimized Compute Only–Dell EMC PowerEdge R440
> Density-Optimized Compute Only– Dell EMC PowerEdge R640
> Modular Compute Only –Dell EMC PowerEdge FX2 with Dell EMC PowerEdge FC640
> Hyper-scale – Dell EMC PowerEdge C6420
6.4 Host Networking
The amount of bandwidth delivered to each host should take into consideration both storage traffic (host to host if using
local storage and remote storage traffic if using a remote solution like an existing Hadoop or Isilon). In our recommended
configurations, we allocated a bonded set of four (4) 10GbE links for both throughput and failover accommodations. There
are many additional ways to deliver substantial bandwidth to Dell EMC PowerEdge platforms including multiple 40GbE
20 Dell EMC PowerEdge and BlueData | version (2.0)
links as well as the emerging 25GbE solutions. 25GbE shows particular promise as it’s cost:performance ratio is
unmatched in the other technologies.
As you scale this solution to multiple racks, we recommend using a leaf/spine design to allow for scale-out capabilities.
You can look to Dell EMC’s wide portfolio of aggregation tier switches at 40GbE and 100GbE to provide the spine for your
topology.
In our recommended configurations, each host is delivering both the Container and the Management traffic on the same
interface. You could split this using VLAN tagging or even divide the 4x 10GbE ports into 2x 2x10GbE trunks with one
trunk handling management, and one trunk handling the container traffic.
6.5 Ready Solution Speed / Capacity
Dell EMC’s Ready Solutions for Hadoop offer multiple scaling options for performance and modularity. When
implementing this solutions you have autonomy to scale the Ready Solution storage independent from the BlueData
compute solution. If you are looking to improve remote storage speed, you could either scale the network or Hadoop
worker host disk speed. If you are looking for capacity, you can scale the number of Hadoop worker hosts in the remote
solution or increasing the drive capacity in the worker hosts.
6.6 Isilon Model
With a broad portfolio of models, the Dell EMC Isilon product family has many options to help you optimize your solution
for speed or capacity as well as any mix of the two. All flash models can significantly improve storage latency. Hosts with
larger, slower drives can help optimize your solution for storage capacity without the need for the physical space that 3
replicas requires in the Local Storage configuration.
> Capacity Focused – Dell EMC Isilon H500 each with 15x 4TB SATA and 2x 1.6TB SSD
> Performance (Disk) Focused – Dell EMC Isilon H600 each with 30x 1.2TB SAS and 2x
1.6TB SSD
> Performance (All-Flash) Focused – Dell EMC Isilon F800 each with 15x 3.2TB SSD
21 Dell EMC PowerEdge and BlueData | version (2.0)
7 Conclusion
As many organizations are finding out, flexibility and agility are keys to success when it comes to data analytics. This
recommended configuration for BlueData on Dell EMC PowerEdge Servers, Ready Solutions, and Isilon provides the
ability to instantiate many of the latest and greatest data analytics tools of today, while providing the infrastructure to build
on for the tools of tomorrow. Consolidating your data analytics resources into a single shared resource pools provides the
ability to keep data under control and resources efficiently utilized, which then allows you to focus your resources where it
matters the most, not on the infrastructure plumbing.
22 Dell EMC PowerEdge and BlueData | version (2.0)
A Tested Configuration Details
The below configuration was stood up in the Dell EMC Customer Solution Centers to perform basic validation
routines. This testing consisted of running in both Local Storage mode as well as attaching to an existing
HDFS filesystem provided by a Dell EMC Ready Solution for Cloudera Hadoop cluster.
Component
Server Model Dell EMC PowerEdge R740XD
Chassis Chassis with up to 24x2.5" HDD including 12 NVMe Drive Bays (not used)
Processor(s) Dual Intel® Xeon® Gold 6132 2.6G, 14C/28T, 19M Cache
Memory 192 GB RAM (12x 16GB DIMMs)
Disk Controller Dell PERC HBA330
OS Storage 1x 2.5” 1.8GB 10K SAS Drive
Node Storage 2x 2.5” 1.8GB 10K SAS Drives
Data Storage 9x 2.5” 1.8TB 10K SAS Drives
Network Card Intel X710 Quad Port 10GbE Network Daughter Card
Operating System Version CentOS Server 7.3.1611 (64-bit)
BlueData EPIC Version 3.0.5 Build 2158
Tested Configuration - Controller/Worker Hosts
Component
Switch Model Dell EMC Networking S3048-ON
Tested Configuration - Management Switch
Component
Switch Model Dell EMC Networking S4048-ON
Tested Configuration - Data Switch
Component
Dell EMC Ready Bundle
Revision
Dell EMC Ready Bundle for Cloudera Hadoop 5.10
Cloudera Hadoop Revision Cloudera CDH 5.10
Tested Remote Storage Solution Version
23 Dell EMC PowerEdge and BlueData | version (2.0)
24 Dell EMC PowerEdge and BlueData | version (2.0)
B Dell EMC Customer Solution Centers
The Dell EMC Customer Solution Centers are a global network of connected labs that allow Dell to help
customers architect, validate and build solutions. With footprints in every region, they can help you whether
through an informal 30-60 minute briefing, a longer half-day workshop, or even on to a proof-of-concept that
would allow you to kick the tires of a solution prior to signing on the dotted line. Simply engage with your
account team and have them submit a request to get started today.
Dell EMC Customer Solution Centers Locations and Offerings
Customer
Use-Cases and
Requirements
Practical hands-on
implementation and
validation.
Validate
Proof of Concept & Design
Open discussions to
help customers explore
solution options.
Innovate
Design Workshop
Customized
Technical
Consultations
Physical and virtual
engagements with global
connectivity.
Collaborate
Technical Briefing
White-boarding
and Solution
Conversations
25 Dell EMC PowerEdge and BlueData | version (2.0)
C Technical support and resources
Dell.com/support is focused on meeting customer needs with proven services and support.
Dell TechCenter is an online technical community where IT professionals have access to numerous resources
for Dell EMC software, hardware and services.
Storage Solutions Technical Documents on Dell TechCenter provide expertise that helps to ensure customer
success on Dell EMC Storage platforms.
Dell EMC Customer Solution Centers are locations where Dell EMC account teams can bring customers to
engage with our Subject Matter Experts to demonstrate how the Dell EMC portfolio can deliver value.
BlueData EPIC Documentation
C.1 Related resources
Dell Reference Configuration for Dell PowerEdge Servers (Version 1.0 of this document)

Contenu connexe

Plus de BlueData, Inc.

Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 

Plus de BlueData, Inc. (12)

BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec Sheet
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
 
Hadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperHadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White Paper
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 Overview
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for Hadoop
 
Spark Infrastructure Made Easy
Spark Infrastructure Made EasySpark Infrastructure Made Easy
Spark Infrastructure Made Easy
 
BlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData Integration with Cloudera Manager
BlueData Integration with Cloudera Manager
 

Dernier

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Dernier (20)

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 

Dell EMC PowerEdge and BlueData - Best Practices Guide

  • 1. A Dell EMC Best Practices Guide Dell EMC PowerEdge and BlueData Enabling Big-Data-as-a-Service and Exploratory Analytics on Dell EMC PowerEdge Compute Platforms, Ready Solutions, and Isilon Storage Kris Applegate (kris.applegate@dell.com) Dell EMC Customer Solution Center October 2017
  • 2. 2 Dell EMC PowerEdge and BlueData | version (2.0) Revisions Date Description February 2016 1.0 Initial release October 2017 2.0 Updated release  Updated to the new Dell EMC brand.  Incorporated new 14th Generation of Dell EMC PowerEdge Servers  Added configurations for existing Hadoop and Isilon  Added design accommodations for new features in EPIC 3.0 The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any software described in this publication requires an applicable software license. Copyright © 2017 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be the property of their respective owners. Published in the USA [10/2/2017] [Best Practices Guide] Dell believes the information in this document is accurate as of its publication date. The information is subject to change without notice.
  • 3. 3 Dell EMC PowerEdge and BlueData | version (2.0) Table of contents Revisions.............................................................................................................................................................................2 Executive Summary............................................................................................................................................................5 1 Introduction...................................................................................................................................................................6 2 BlueData Epic Server Roles.........................................................................................................................................8 2.1 Controllers ..........................................................................................................................................................8 2.2 Workers ..............................................................................................................................................................8 2.3 Gateways............................................................................................................................................................8 3 Host Storage Types......................................................................................................................................................9 3.1 Operation System (OS) ......................................................................................................................................9 3.2 Node Storage......................................................................................................................................................9 3.3 Data Storage.......................................................................................................................................................9 4 Recommended Deployment Models ..........................................................................................................................11 4.1 Local Storage....................................................................................................................................................11 4.2 Remote Storage on Dell EMC Hadoop Ready Bundle.....................................................................................13 4.3 Remote Storage on Dell EMC Isilon.................................................................................................................14 5 Component Builds ......................................................................................................................................................15 5.1 Recommended Host Counts ............................................................................................................................15 5.2 Controller/Worker - Local Storage (High Capacity) ..........................................................................................15 5.3 Controller/Worker - Local Storage (High Performance) ...................................................................................16 5.4 Controller/Worker – Existing Ready Bundle / Isilon..........................................................................................17 5.5 Gateways (Optional).........................................................................................................................................17 5.6 Management Switches .....................................................................................................................................18 5.7 Data Switches...................................................................................................................................................18 6 Additional Configuration Considerations ....................................................................................................................19 6.1 Host Count........................................................................................................................................................19 6.2 Node Storage....................................................................................................................................................19 6.3 Data Storage and Platform Type......................................................................................................................19 6.4 Host Networking ...............................................................................................................................................19 6.5 Ready Solution Speed / Capacity.....................................................................................................................20 6.6 Isilon Model.......................................................................................................................................................20 7 Conclusion..................................................................................................................................................................21 A Tested Configuration Details......................................................................................................................................22
  • 4. 4 Dell EMC PowerEdge and BlueData | version (2.0) B Dell EMC Customer Solution Centers........................................................................................................................24 C Technical support and resources ...............................................................................................................................25 C.1 Related resources.............................................................................................................................................25
  • 5. 5 Dell EMC PowerEdge and BlueData | version (2.0) Executive Summary The demand for businesses to make data-driven decisions has become no longer optional, but rather required. The sheer number of choices in data processing platforms, both for structured and unstructured data are overwhelming. Knowledge workers in customer’s lines of business are requiring IT organizations to rapidly architect, provision, and maintain multiple sets of tools. Failure of an IT team to meet this demand could see users trying to stand-up their own “unapproved / unmanaged” clusters or, at a greater risk, move data out in un-governed public clouds where company intellectual property and customer data could be at risk. The benefit of partnering BlueData’s EPIC software with Dell EMC’s powerful portfolio of compute, storage, and networking options allows our customers to consolidate multiple data analytics clusters into one shared platform. This results in simplification of infrastructure, comprehensive control of data, and flexible and agile capabilities around spinning up new and emerging technologies easily. This solution allows IT to focus on helping their company differentiate themselves at the workload layer, rather than spending time at the infrastructure plumbing layer.
  • 6. 6 Dell EMC PowerEdge and BlueData | version (2.0) 1 Introduction The demand for businesses to make data-driven decisions has become no longer optional, but rather required. The sheer number of choices in data processing platforms, both for structured and unstructured data are overwhelming. Knowledge workers in customer’s lines of business are requiring IT organizations to rapidly architect, provision, and maintain multiple sets of tools. Failure of an IT team to meet this demand could see users trying to stand-up their own “unapproved / unmanaged” clusters or, at a greater risk, move data out in un-governed public clouds where company intellectual property and customer data could be at risk. Current Data Analytics Cluster State The benefit of partnering BlueData’s EPIC software with Dell EMC’s powerful portfolio of compute, storage, and networking options allows our customers to consolidate multiple data analytics clusters into one shared platform. This results in simplification of infrastructure, comprehensive control of data, and flexible and agile capabilities around spinning up new and emerging technologies easily. This solution allows IT to focus on helping their company differentiate themselves at the workload layer, rather than spending time at the infrastructure plumbing layer.
  • 7. 7 Dell EMC PowerEdge and BlueData | version (2.0) BlueData EPIC
  • 8. 8 Dell EMC PowerEdge and BlueData | version (2.0) 2 BlueData Epic Server Roles Below we describe the major server roles that are needed to stand up a production BlueData cluster. Updated sizing guidance can always be found in the most recent documentation from BlueData. 2.1 Controllers These hosts are the initial installation point of BlueData and the key orchestrators in instantiating the containerized clusters on the Worker hosts. While a single host can act as the only controller (as well as a Worker host), we recommend that you deploy 3 Controller hosts as a master, standby, and arbiter highly available configuration. By default all controllers are provisioned as Workers as well, though some customer’s requirements may dictate that container- provisioning be disabled on the controller hosts in order to dedicate them to orchestration only. These hosts can also serve as network routers between the containerized internal overlay network and the outside world. 2.2 Workers Workers form the backbone of the compute infrastructure as well as provide the disk space on which the containerized hosts spin-up their scratch/ephemeral storage. In the case of the Local Storage configuration these Worker hosts will also hold the disks that will be dedicated to providing the Data Storage capacity. You’ll want a platform here that has multiple disk options of either large form-factor (LFF) disks that are suited to high-capacity or small form-factor (SFF) disks that emphasize performance. CPU cores and memory should be sized based off of the anticipated number of containers each host should be expected to handle in addition to 1 core and 1/8 allocated RAM +1GB for overhead. It’s important to understand that CPU cores can be oversubscribed, but memory can’t be. 2.3 Gateways Gateway hosts are optional components that provide the ability to surface network end-points for the internal containerized clusters. This allows designs to forgo the need for a routable container network through the controllers. Containerized clusters will terminate their user interface and service end-points on the front-facing IPs of the gateway hosts, mapped to ports above 10000. These hosts can be very light-weight configurations as well as even virtual machines on existing compute infrastructure. In the case of the Hadoop Ready Bundle solutions, you could deploy these on the existing Hadoop Edge Nodes.
  • 9. 9 Dell EMC PowerEdge and BlueData | version (2.0) 3 Host Storage Types Each host in a BlueData cluster contains a couple of key types of local and/or remote storage. Some of these storage types are optionally remote (Data Storage) and others are provisioned from local disk resources (Operating System and Node Storage). 3.1 Operation System (OS) This is the storage on which the host-operating system is installed. All 3 roles (Controllers, Workers, and Gateways) have local Operating Systems that will need to be installed prior to BlueData package setup. We recommend, as with any production deployment, that you mirror the OS using RAID for resiliency. Check BlueData’s software requirements for the latest supported Linux OS revisions. Note that currently, the minimum amount of storage for the OS volume is 300GB, which can prevent the use of some OS/boot storage options like the Dell EMC Boot Optimized Storage Subsystem (BOSS) without manually overriding the installer. 3.2 Node Storage Node storage is the temporary/ephemeral/scratch space on each Worker host where the containers are instantiated. This is high-speed storage that is local to each Worker host. Regardless of deployment model, Node storage space is required. This must be un-formatted volumes that are presented to the Linux OS (e.g. /dev/sda, /dev/vda, /dev/nvme0n1). It’s important to remember that you must present each drive individually as opposed to in aggregate (RAID). In Dell PowerEdge RAID Controllers (PERC) this means flagging all the drives as “Non-RAID”. Detailed sizing considerations can be referred to in the BlueData System Architecture section of the documentation. 3.3 Data Storage The storage that each of the tenants use to store source data and results is referred to as Data storage. This can be made up of multiple local drives or of remote storage on an existing Hadoop (HDFS or Kerberos HDFS) and NAS (NFS). If using local storage, these drives must also be Non-RAID and unformatted. When using remote storage, you’ll use the BlueData DataTap mechanism to transparently attach over IP.
  • 10. 10 Dell EMC PowerEdge and BlueData | version (2.0) Below is an example of how you can carve up the local drives on a Dell EMC PowerEdge R740XD to provide all 3 of the storage roles needed for BlueData. This just serves and one example of hardware configurations, further examples for additional use-cases and performance characteristics are detailed in Sections 4 and 5. Example Disk Layout for Controller/Worker - Local Storage (High Performance) GB SAS 12Gb 15k GB SAS 12Gb 15k GB SATA 6Gb SSD GB SATA 6Gb SSD TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k Drive28 Drive29 Drive30 Drive31 Drive24 Drive25 Drive26 Drive27 Drive0 Drive1 Drive2 Drive3 Drive4 Drive5 Drive6 Drive7 Drive8 Drive9 Drive10 Drive11 Drive12 Drive13 Drive14 Drive15 Drive16 Drive17 Drive18 Drive19 Drive20 Drive21 Drive22 Drive23 Front Hot-Plug Mid-Bay Flexbay Data Node OS
  • 11. 11 Dell EMC PowerEdge and BlueData | version (2.0) 4 Recommended Deployment Models The flexibility of the BlueData platform really shines when as you begin to look at the possible deployment models. Below you’ll see 3 recommended models ranging from all local storage through to remote storage provided by either existing Hadoop clusters or Dell EMC Isilon’s scale-out NAS capabilities. Which model you choose should be determined after consultation with your Dell EMC and BlueData sales support resources (as well as the Dell EMC Customer Solutions Center) 4.1 Local Storage In this model, each Worker and Controller/Worker has local storage for all three storage types: OS, Node, and Data resulting in a more simplified architecture. The key differentiator in this deployment model is the local storage is allocated as Tenant data. Local disks (usually rotational) are used to make up the BlueData HDFS instance that is shared across all Tenants. Each tenant is restricted from viewing the other tenant’s files, but they are all resident on the single HDFS namespace that BlueData provides. Different server models can be used here to provide a platform that is optimized for either high-capacity (LFF) or high performance (SFF). However, we do recommend that all Worker hosts have the same disk configuration and layout. Dell EMC PowerEdge R740XD with Large Form-factor Drives (Top) and Small Form-factor Drives (Bottom) – Bezels Removed TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k GB SAS 12Gb 15k GB SAS 12Gb 15k GB SATA 6Gb SSD GB SATA 6Gb SSD TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k
  • 12. 12 Dell EMC PowerEdge and BlueData | version (2.0) Local Storage Deployment Model TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k StackIDStackID TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k OS Node Data TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k Controllers / Workers Workers (Master, Standby, Arbiter) Data Switches 27 29 31 33 35 28 30 32 34 36 37 39 41 43 45 47 38 40 42 44 46 48 LNK ACT7 9 11 2 4 6 8 10 12 13 15 17 19 21 23 14 16 18 20 22 24 25 26 Stack No. 1 2 49 50SFP+ LNK ACT 5 COMBO PORTS 47 48 27 29 31 33 35 28 30 32 34 36 37 39 41 43 45 47 38 40 42 44 46 48 LNK ACT7 9 11 2 4 6 8 10 12 13 15 17 19 21 23 14 16 18 20 22 24 25 26 Stack No. 1 2 49 50SFP+ LNK ACT 5 COMBO PORTS 47 48 Management Switches iDRACs Gateway(s) (Optional) Corporate Network
  • 13. 13 Dell EMC PowerEdge and BlueData | version (2.0) 4.2 Remote Storage on Dell EMC Hadoop Ready Bundle Many customers have existing, robust, hardened primary Hadoop clusters that they would like to leverage for storage. In this deployment model you can use BlueData as the Controller/Compute tier and leave your source data and result data on the remote HDFS solution. Dell EMC has a long history of providing validated reference configurations of both Cloudera and Hortonworks distributions of Hadoop. These solutions can save IT departments hours of valuable time by taking care of all the steps of racking, provisioning the operating systems, and installing and configuration the Hadoop distribution, allowing IT to spend time focused on the workloads that differentiate their company. Remote Storage on Dell EMC Ready Bundle Deployment Model StackIDStackID TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k OS Node TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k Workers (Master, Standby, Arbiter) Data Switches 27 29 31 33 35 28 30 32 34 36 37 39 41 43 45 47 38 40 42 44 46 48 LNK ACT7 9 11 2 4 6 8 10 12 13 15 17 19 21 23 14 16 18 20 22 24 25 26 Stack No. 1 2 49 50SFP+ LNK ACT 5 COMBO PORTS 47 48 27 29 31 33 35 28 30 32 34 36 37 39 41 43 45 47 38 40 42 44 46 48 LNK ACT7 9 11 2 4 6 8 10 12 13 15 17 19 21 23 14 16 18 20 22 24 25 26 Stack No. 1 2 49 50SFP+ LNK ACT 5 COMBO PORTS 47 48 Management Switches Data Storage Name Node(s) iDRACs TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k Gateway(s) (Optional) Controllers / Workers Corporate Network
  • 14. 14 Dell EMC PowerEdge and BlueData | version (2.0) 4.3 Remote Storage on Dell EMC Isilon Dell EMC Isilon’s multi-protocol scale-out storage capabilities can be leveraged to provide a remote filesystem with a significantly lower total cost of ownership (TCO). By reducing the need for multiple software replicas and sharing the same data across SMB/NFS/HDFS Isilon can reduce the complexity of a remote storage solution while allowing the introduction of a robust portfolio of data management options. Remote Storage on Dell EMC Isilon Deployment Model TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k StackIDStackID TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k OS Node TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k Workers (Master, Standby, Arbiter) Data Switches 27 29 31 33 35 28 30 32 34 36 37 39 41 43 45 47 38 40 42 44 46 48 LNK ACT7 9 11 2 4 6 8 10 12 13 15 17 19 21 23 14 16 18 20 22 24 25 26 Stack No. 1 2 49 50SFP+ LNK ACT 5 COMBO PORTS 47 48 27 29 31 33 35 28 30 32 34 36 37 39 41 43 45 47 38 40 42 44 46 48 LNK ACT7 9 11 2 4 6 8 10 12 13 15 17 19 21 23 14 16 18 20 22 24 25 26 Stack No. 1 2 49 50SFP+ LNK ACT 5 COMBO PORTS 47 48 Management Switches Data Storage iDRACs Gateway(s) (Optional) Controllers / Workers Corporate Network
  • 15. 15 Dell EMC PowerEdge and BlueData | version (2.0) 5 Component Builds 5.1 Recommended Host Counts Components Local Storage Remote Storage on Dell EMC Hadoop Ready Bundle Remote Storage on Dell EMC Isilon Controller/Worker Hosts (with local data storage) 3 (High Capacity or Performance) N/A N/A Worker Only Hosts (with local data storage) 7 (High Capacity or Performance) N/A N/A Controller/Worker Hosts (without local data storage) N/A 3 3 Worker Only Hosts (without local data storage) N/A 7 7 Gateways 2 2 2 Management Switches 2 2 2 Data Switches 2 2 2 External Storage Solution N/A Dell EMC Ready Bundle for Hadoop Dell EMC Isilon Solution Usable Node Storage 7.68 TB (SSD) 7.68 TB (SSD) 7.68 TB (SSD) Usable Data Storage 21.3 TB (Capacity) 11.2 TB (Performance) Varies Depending on Existing Hadoop’s Capacity Varies Depending on Isilon Configuration Recommended Host Counts 5.2 Controller/Worker - Local Storage (High Capacity) Component Server Model Dell EMC PowerEdge R740XD TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k
  • 16. 16 Dell EMC PowerEdge and BlueData | version (2.0) Chassis Chassis with up to 12x3.5" HDD, 4x3.5" in Mid-bay, and 4x2.5" in Flexbay Processor(s) Dual Intel® Xeon® Gold 6136 3.0G, 12C/24T, 24.75M Cache Memory 384 GB RAM (12x 32GB DIMMs) Disk Controller Dell PERC H730P OS Storage 2x 2.5” 600GB 10K SAS Drives (Flexbay) – RAID 1 (Mirror) Node Storage 2x 2.5” 3.84TB Mixed Use SAS SSD Drives (Flexbay) – Non-RAID Data Storage 16x 3.5” 4TB 7.2K SATA (12x Front and 4x Mid-bay) – Non-RAID Network Card Intel X710 Quad Port 10GbE Network Daughter Card Controller/Worker - Local Storage (High Capacity) 5.3 Controller/Worker - Local Storage (High Performance) Component Server Model Dell EMC PowerEdge R740XD Chassis Chassis with up to 24x2.5" HDD, 4x2.5" in Mid-bay, and 4x2.5" in Flexbay Processor(s) Dual Intel® Xeon® Gold 6136 3.0G, 12C/24T, 24.75M Cache Memory 384 GB RAM (12x 32GB DIMMs) Disk Controller Dell PERC H740P OS Storage 2x 2.5” 600GB 10K SAS Drives (Flexbay) – RAID 1 (Mirror) Node Storage 2x 2.5” 3.84TB Mixed Use SAS SSD Drives (Flexbay) – Non-RAID Data Storage 28x 2.5” 1.8TB 10K SAS (24x Front and 4x Mid-bay) – Non-RAID Network Card Intel X710 Quad Port 10GbE Network Daughter Card Controller/Worker - Local Storage (High Performance) GB SAS 12Gb 15k GB SAS 12Gb 15k GB SATA 6Gb SSD GB SATA 6Gb SSD TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k TB SATA 6Gb 7.2k
  • 17. 17 Dell EMC PowerEdge and BlueData | version (2.0) 5.4 Controller/Worker – Existing Ready Bundle / Isilon Component Server Model Dell EMC PowerEdge R740XD Chassis Chassis with up to 24x2.5" HDD Processor(s) Dual Intel® Xeon® Gold 6136 3.0G, 12C/24T, 24.75M Cache Memory 384 GB RAM (12x 32GB DIMMs) Disk Controller Dell PERC H740P OS Storage 2x 2.5” 600GB 10K SAS Drives – RAID 1 (Mirror) Node Storage 2x 2.5” 3.84TB Mixed Use SAS SSD Drives – Non-RAID Data Storage 1x Dell EMC Isilon Generation 6 Chassis w/  4x Dell EMC Isilon H500 each w/ o 15x 4TB SATA Drives o 2x 1.6TB SSD Network Card Intel X710 Quad Port 10GbE Network Daughter Card Controller/Worker – Existing Ready Bundle / Isilon 5.5 Gateways (Optional) The optional gateway hosts can be a very light-weight server whose configuration is highly dependent on the customer use-case. A non-specific configuration is listed below that details the recommended CPU core count and RAM amounts. Component CPU Core Count 8-Cores Memory 32GB OS Storage 100 GB Gateways (Optional) GB SAS 12Gb 15k GB SAS 12Gb 15k GB SATA 6Gb SSD GB SATA 6Gb SSD
  • 18. 18 Dell EMC PowerEdge and BlueData | version (2.0) 5.6 Management Switches Component Switch Model Dell EMC Networking S3148-ON Management Switches 5.7 Data Switches Component Switch Model Dell EMC Networking S4148F-ON Data Switches 27 29 31 33 35 28 30 32 34 36 37 39 41 43 45 47 38 40 42 44 46 48 LNK ACT7 9 11 2 4 6 8 10 12 13 15 17 19 21 23 14 16 18 20 22 24 25 26 Stack No. 1 2 49 50SFP+ LNK ACT 5 COMBO PORTS 47 48 StackID
  • 19. 19 Dell EMC PowerEdge and BlueData | version (2.0) 6 Additional Configuration Considerations 6.1 Host Count This document outlines 3 configurations that allow for 3 different methods for storing persistent data. The amount of compute resources in each configuration have been kept the same (processor core count and RAM) so that each configuration can instantiate the same number of containerized applications. However, it should be noted that in the remote storage configurations (Remote Hadoop and Dell EMC Isilon), since the local storage processing overhead of offloaded to those storage systems, the amount of compute and memory needed could diminish. With that in mind, you could see the same workload performance in those configurations as with the local storage configuration, all while allowing for a lower total host count. 6.2 Node Storage The drives providing Node storage are the best place to make an investment in speed. This can range from faster / larger SSDs on through PCI-E NVMe devices with microsecond latency. Improving the speed here will improve the speed of provisioning containers, the speed at which those containers can execute storage operations, and reduce the latency of transactions. The sizing of this space will need to take into account how many containers of each solution stack will need to be deployed on each host. The advantage of using containerization here is that you’re not penalized for the OS portion of each host since they are shared amongst all other hosts that use that same container’s OS. This means that the only thing that distinguishes each container from each other are the deltas to the base image. 6.3 Data Storage and Platform Type The amount of necessary data storage will vary depending on customer use-case. The Dell EMC PowerEdge R740XD compute platform allows for multiple configurations across both the performance and capacity dimensions. However, there are a number of additional optional platforms you could consider: Controller/Worker –Local Storage > Cost-Optimized Local Storage–Dell EMC PowerEdge R540 > Capacity-Optimized Local Storage – Dell EMC R740XD with larger drives (limit to less than 100GB raw per host as a best practice) > Density-Optimized Local Storage– Dell EMC PowerEdge R640 > Modular Local Storage–Dell EMC PowerEdge FX2 with Dell EMC PowerEdge FC640 and Dell EMC PowerEdge FD332 Controller/Worker > Cost-Optimized Compute Only–Dell EMC PowerEdge R440 > Density-Optimized Compute Only– Dell EMC PowerEdge R640 > Modular Compute Only –Dell EMC PowerEdge FX2 with Dell EMC PowerEdge FC640 > Hyper-scale – Dell EMC PowerEdge C6420 6.4 Host Networking The amount of bandwidth delivered to each host should take into consideration both storage traffic (host to host if using local storage and remote storage traffic if using a remote solution like an existing Hadoop or Isilon). In our recommended configurations, we allocated a bonded set of four (4) 10GbE links for both throughput and failover accommodations. There are many additional ways to deliver substantial bandwidth to Dell EMC PowerEdge platforms including multiple 40GbE
  • 20. 20 Dell EMC PowerEdge and BlueData | version (2.0) links as well as the emerging 25GbE solutions. 25GbE shows particular promise as it’s cost:performance ratio is unmatched in the other technologies. As you scale this solution to multiple racks, we recommend using a leaf/spine design to allow for scale-out capabilities. You can look to Dell EMC’s wide portfolio of aggregation tier switches at 40GbE and 100GbE to provide the spine for your topology. In our recommended configurations, each host is delivering both the Container and the Management traffic on the same interface. You could split this using VLAN tagging or even divide the 4x 10GbE ports into 2x 2x10GbE trunks with one trunk handling management, and one trunk handling the container traffic. 6.5 Ready Solution Speed / Capacity Dell EMC’s Ready Solutions for Hadoop offer multiple scaling options for performance and modularity. When implementing this solutions you have autonomy to scale the Ready Solution storage independent from the BlueData compute solution. If you are looking to improve remote storage speed, you could either scale the network or Hadoop worker host disk speed. If you are looking for capacity, you can scale the number of Hadoop worker hosts in the remote solution or increasing the drive capacity in the worker hosts. 6.6 Isilon Model With a broad portfolio of models, the Dell EMC Isilon product family has many options to help you optimize your solution for speed or capacity as well as any mix of the two. All flash models can significantly improve storage latency. Hosts with larger, slower drives can help optimize your solution for storage capacity without the need for the physical space that 3 replicas requires in the Local Storage configuration. > Capacity Focused – Dell EMC Isilon H500 each with 15x 4TB SATA and 2x 1.6TB SSD > Performance (Disk) Focused – Dell EMC Isilon H600 each with 30x 1.2TB SAS and 2x 1.6TB SSD > Performance (All-Flash) Focused – Dell EMC Isilon F800 each with 15x 3.2TB SSD
  • 21. 21 Dell EMC PowerEdge and BlueData | version (2.0) 7 Conclusion As many organizations are finding out, flexibility and agility are keys to success when it comes to data analytics. This recommended configuration for BlueData on Dell EMC PowerEdge Servers, Ready Solutions, and Isilon provides the ability to instantiate many of the latest and greatest data analytics tools of today, while providing the infrastructure to build on for the tools of tomorrow. Consolidating your data analytics resources into a single shared resource pools provides the ability to keep data under control and resources efficiently utilized, which then allows you to focus your resources where it matters the most, not on the infrastructure plumbing.
  • 22. 22 Dell EMC PowerEdge and BlueData | version (2.0) A Tested Configuration Details The below configuration was stood up in the Dell EMC Customer Solution Centers to perform basic validation routines. This testing consisted of running in both Local Storage mode as well as attaching to an existing HDFS filesystem provided by a Dell EMC Ready Solution for Cloudera Hadoop cluster. Component Server Model Dell EMC PowerEdge R740XD Chassis Chassis with up to 24x2.5" HDD including 12 NVMe Drive Bays (not used) Processor(s) Dual Intel® Xeon® Gold 6132 2.6G, 14C/28T, 19M Cache Memory 192 GB RAM (12x 16GB DIMMs) Disk Controller Dell PERC HBA330 OS Storage 1x 2.5” 1.8GB 10K SAS Drive Node Storage 2x 2.5” 1.8GB 10K SAS Drives Data Storage 9x 2.5” 1.8TB 10K SAS Drives Network Card Intel X710 Quad Port 10GbE Network Daughter Card Operating System Version CentOS Server 7.3.1611 (64-bit) BlueData EPIC Version 3.0.5 Build 2158 Tested Configuration - Controller/Worker Hosts Component Switch Model Dell EMC Networking S3048-ON Tested Configuration - Management Switch Component Switch Model Dell EMC Networking S4048-ON Tested Configuration - Data Switch Component Dell EMC Ready Bundle Revision Dell EMC Ready Bundle for Cloudera Hadoop 5.10 Cloudera Hadoop Revision Cloudera CDH 5.10 Tested Remote Storage Solution Version
  • 23. 23 Dell EMC PowerEdge and BlueData | version (2.0)
  • 24. 24 Dell EMC PowerEdge and BlueData | version (2.0) B Dell EMC Customer Solution Centers The Dell EMC Customer Solution Centers are a global network of connected labs that allow Dell to help customers architect, validate and build solutions. With footprints in every region, they can help you whether through an informal 30-60 minute briefing, a longer half-day workshop, or even on to a proof-of-concept that would allow you to kick the tires of a solution prior to signing on the dotted line. Simply engage with your account team and have them submit a request to get started today. Dell EMC Customer Solution Centers Locations and Offerings Customer Use-Cases and Requirements Practical hands-on implementation and validation. Validate Proof of Concept & Design Open discussions to help customers explore solution options. Innovate Design Workshop Customized Technical Consultations Physical and virtual engagements with global connectivity. Collaborate Technical Briefing White-boarding and Solution Conversations
  • 25. 25 Dell EMC PowerEdge and BlueData | version (2.0) C Technical support and resources Dell.com/support is focused on meeting customer needs with proven services and support. Dell TechCenter is an online technical community where IT professionals have access to numerous resources for Dell EMC software, hardware and services. Storage Solutions Technical Documents on Dell TechCenter provide expertise that helps to ensure customer success on Dell EMC Storage platforms. Dell EMC Customer Solution Centers are locations where Dell EMC account teams can bring customers to engage with our Subject Matter Experts to demonstrate how the Dell EMC portfolio can deliver value. BlueData EPIC Documentation C.1 Related resources Dell Reference Configuration for Dell PowerEdge Servers (Version 1.0 of this document)