What it takes to run Hadoop at Scale: Yahoo! Perspectives

W h a t i t Ta k e s t o R u n H a d o o p a t S c a l e :
Ya h o o P e r s p e c t i v e s
P R E S E N T E D B Y S u m e e t S i n g h , R a j i v C h i t t a j a l l u ⎪ J u n e 1 1 , 2 0 1 5
H a d o o p S u m m i t 2 0 1 5 , S a n J o s e

Introduction
2
 Senior Engineer with the Hadoop Operations team at
Yahoo
 Involved with Hadoop since 2006, starting with the
early 400-node to over 42,000-node prod env. today
 Started with Center for Development of Advanced
Computing in 2002 before joining Yahoo in
 BS degree in Computer Science from Osmania
University, India
Rajiv Chittajallu
Sr. Principle Engineer
Hadoop Operations
701 First Avenue,
Sunnyvale, CA 94089 USA
@rajivec
 Manages Cloud Storage and Big Data products team
at Yahoo
 Responsible for Product Management, Strategy and
Customer Engagements
 Managed Cloud Engineering products teams and
headed Strategy functions for the Cloud Platform
Group at Yahoo
 MBA from UCLA and MS from RPI
Sumeet Singh
Sr. Director, Product Management
Cloud Storage and Big Data Platforms
701 First Avenue,
Sunnyvale, CA 94089 USA
@sumeetksingh

Hadoop a Secure Shared Hosted Multi-tenant Platform
3
TV
PC
Phone
Tablet
Pushed Data
Pulled Data
Web Crawl
Social
Email
3rd Party Content
Data Highway
Hadoop Grid
BI, Reporting, Adhoc Analytics
Data
Content
Ads
No-SQL
Serving Stores
Serving

Platform Evolution (2006 – 2015)
4
0
100
200
300
400
500
600
700
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
50,000
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
RawHDFS(inPB)
#Servers
Year
Servers Storage
Yahoo!
Commits to
Scaling
Hadoop for
Production Use
Research
Workloads
in Search and
Advertising
Production
(Modeling)
with machine
learning &
WebMap
Revenue
Systems
with Security,
Multi-
tenancy, and
SLAs
Open
Sourced with
Apache
Hortonworks
Spinoff for
Enterprise
hardening
Nextgen
Hadoop
(H 0.23 YARN)
New Services
(HBase,
Storm, Spark,
Hive)
Increased
User-base
with partitioned
namespaces
Apache H2.6
(Scalable ML, Latency,
Utilization, Productivity)
Server
s
Use Cases
Hadoo
p
43,000 300
HBase 3,000 70
Storm 2,000 50

Top 10 Considerations for Scaling a Hadoop-based Platform
5
On-Premise or Public Cloud
Total Cost Of Ownership (TCO)
Hardware Configuration
2
3
Network4
Software Stack5
6
7
8
10
Security and Account Management
Data Lifecycle Management and BCP
Metering, Audit and Governance
9 Integration with External Systems
Debunking Myths
1

On-Premise or Public Cloud – Deployment Models
6
1
Private (dedicated)
Clusters
Hosted Multi-tenant
(private cloud)
Clusters
Hosted Compute
Clusters
 Large demanding use
cases
 New technology not
yet platformized
 Data movement and
regulation issues
 When more cost
effective than on-
premise
 Time to market/
results matter
 Data already in
public cloud
 Source of truth for all
of orgs data
 App delivery agility
 Operational efficiency
and cost savings
through economies of
scale
On-Premise Public Cloud
Purpose-built
Big Data
Clusters
 For performance,
tighter integration
with tech stack
 Value added services
such as monitoring,
alerts, tuning and
common tools

On-Premise or Public Cloud – Selection Criteria
7
1
 Fixed, does not vary with utilization
 Favors scale and 24x7 centralized ops
 Variable with usage
 Favors run and done, decentralized ops
Cost
 Aggregated from disparate or distributed
sources
 Typically generated and stored in the
cloud
Data
 Job queue, cap. sched., BCP, catchup
 Controlled latency and throughput
 No guarantees (beyond uptime) without
provisioning additional resources
SLA
 Control over deployed technology
 Requires platform team/ vendor support
 Little to no control over tech stack
 No need for platform R&D headcount
Tech Stack
 Shared env., control over data
/movement, PII, ACLs, pluggable
security
 Data typically not shared among users
in the cloud
Security
 Matters, complex to develop and
operate
 Does not matter, clusters are dynamic/
virtual and dedicated
Multi-
tenancy
On-Premise Public CloudCriteria

On-Premise or Public Cloud – Evaluation
8
1
On-Premise
Public Cloud
Cost
Data
SLA
Tech Stack
Security
Multi-tenancy

On-Premise or Public Cloud – Utilization Matters
9
1
Utilization / Consumption (Compute and Storage)
Cost($)
On-premise
Hadoop as a
Service
On-demand
public cloud
service
Terms-based
public cloud
service
Favors on-premise
Hadoop as a Service
Favors public cloud
service
x
x
Current and expected
or target utilization
can provide further
insights into your
operations and cost
competitiveness
Highstartingcost
Scalingup
Crossover
point 1

Total Cost Of Ownership (TCO) – Components
10
2
$2.1 M
60%
12%
7%
6%
3%
2%
6
5
4
3
2
1
7
10%
Operations Engineering
 Headcount for service engineering and data operations teams responsible for day-to-day ops and
support
6
Acquisition/ Install (One-time)
 Labor, POs, transportation, space, support, upgrades, decommissions, shipping/ receiving etc.
5
Network Hardware
 Aggregated network component costs, including switches, wiring, terminal servers, power strips etc.
4
Active Use and Operations (Recurring)
 Recurring datacenter ops cost (power, space, labor support, and facility maintenance
3
R&D HC
 Headcount for platform software development, quality, and release engineering
2
Cluster Hardware
 Data nodes, name nodes, job trackers, gateways, load proxies, monitoring, aggregator, and web servers
1
Monthly TCOTCO Components
Network Bandwidth
 Data transferred into and out of clusters for all colos, including cross-colo transfers
7
ILLUSTRATIVE

Total Cost Of Ownership (TCO) – Unit Costs (Hadoop)
11
2
Container memory
where apps perform
computation and
access HDFS if
needed
Container CPU
cores used by apps
to perform
computation / data
processing
Network bandwidth
needed to move
data into/out of the
clusters by the app
$ / GB-Hour (H 2.0+)
GBs of Memory
available for an hour
Monthly Memory Cost
Avail. Memory Capacity
$ / vCore-Hour (H 2.6+)
vCores of CPU
available for an hour
Monthly CPU Cost
Avail. CPU vCores
Unit
Total Capacity
Unit Cost
$ / GB of data stored
Usable storage space
(less replication and
overheads)
Monthly Storage Cost
Avail. Usable Storage
$ / GB for Inter-region
data transfers
Inter-region (peak) link
capacity
Monthly BW Cost
Monthly GB In + Out
Files and
directories used by
the apps to
understand/ limit the
load on NN)
HFDS (usable)
space needed by
an app with default
replication factor of
three

Total Cost Of Ownership (TCO) – Consumption Costs
12
2
Map GB-Hours = GB(M1) x
T(M1) + GB(M2) x T(M2) +
…
Reduce GB-Hours = GB(R1)
x T(R1) + GB(R2) x T(R2) +
…
Cost = (M + R) GB-Hour x
$0.002 / GB-Hour / Month
= $ for the Job/ Month
(M+R) GB-Hours for all
jobs can summed up for
the month for a user, app,
BU, or the entire platform
Monthly Job
and Task
Cost
Monthly Roll-
ups
Map vCore-Hours =
vCores(M1) x T(M1) +
vCores(M2) x T(M2) + …
Reduce vCore-Hours =
vCores(R1) x T(R1) +
vCores(R2) x T(R2) + …
Cost = (M + R) vCore-Hour
x $0.002 / vCore-Hour /
Month
= $ for the Job/ Month
(M+R) vCore-Hours for all
jobs can summed up for the
month for a user, app, BU,
or the entire platform
/ project (app) quota in GB
(peak monthly used)
/ user quota in GB (peak
monthly used)
/ data as each user
accountable for their portion
of use. For e.g.
GB Read (U1)
GB Read (U1) + GB Read
(U2) + …
Roll-ups through
relationship among user,
file ownership, app, and
their BU
Bandwidth measured at the
cluster level and divided
among select apps and
users of data based on
average volume In/Out
Roll-ups through
relationship among user,
app, and their BU

Hardware Configuration – Physical Resources
13
3
.
.
.
.
Datacenter 1
Rack 1 Rack N
.
.
Clusters in Datacenters Server Resources
C-nn / 64,128,256 G / 4000, 6000 etc.

Hardware Configuration – Eventual Heterogeneity
14
3
24 G 8 cores SATA 0.5 TB
48 G 12 cores SATA 1.0 TB
64 G Harpertown SATA 2.0 TB
128 G Sandy Bridge SATA 3.0 TB
192 G Ivy Bridge SATA 4.0 TB
256 G Haswell SATA 6.0 TB
384 G
 Heterogeneous Configurations:
10s of configs of data nodes
(collected over the years) without
dictating scheduling decisions –
let the framework balance out the
configs
 Heterogeneous Storage:
HDFS supports heterogeneous
storage (HDD, SSD, RAM, RAID
etc.) – HDFS-2832, HDFS-5682
 Heterogeneous Scheduling:
operate multiple purpose
hardware in the same cluster
(e.g. GPUs) – YARN 796

Network – Common Backplane
15
4
DataNode NodeManager
NameNode RM
DataNodes RegionServers
NameNode HBase Master Nimbus
Supervisor
Administration, Management and Monitoring
ZooKeeper
Pools
HTTP/HDFS/GDM
Load Proxies
Applications and Data
Data
Feeds
Data
Stores
Oozie
Server
HS2/
HCat
Network
Backplane

Network – Bottleneck Awareness
16
4
Hadoop
Cluster
(Data Set 1)
Hadoop
Cluster
(Data Set 2)
HBase Cluster
(Low-latency
Data Store)
Storm Cluster
(Real-time /
Stream
Processing)
Large dataset joins
or data sharing over
network
1
Large extractions
may saturate the
network
2
Fast bulk updates
may saturate the
network
3 Large data
copies may
not be
possible
4

Network – 1/10G BAS (Rack Locality Not A Major Issue)
17
4
RSW
…
…
…
N x
RSW RSW
BAS1-1 BAS1-2
FAB 1 FAB 2 FAB 3 FAB 4 FAB 5 FAB 6 FAB 7 FAB 8
L3
Backplane
RSW
…
…
…
N x
RSW RSW
BAS8-1 BAS8-2
L3
Backplane
…
1 Gbps
2:1 oversubscription
10 Gbps
8 x 10 Gbps
Fabric
Layer
48 racks, 15,360 hosts
SPOF!

Network –10G CLOS (Server Placement Not an Issue)
18
4
Spine 1
Leaf 1
Spine 2
Leaf 2
Leaf 3
Leaf 4
Spine 15 Leaf 29
Leaf 30
Leaf 31
Spine 0
Leaf 0
.
.
.
.
.
.
Virtual Chassis 0
Spine 1
Leaf 1
Spine 2
Leaf 2
Leaf 3
Leaf 4
Spine 15 Leaf 29
Leaf 30
Leaf 31
Spine 0
Leaf 0
.
.
.
.
.
.
Virtual Chassis 1
RSW
N x
RSW RSW
10 Gbps
5:1 oversubscription
16 spines, 32 leafs
2 x 40 Gbps
512 racks, 20,480 hosts
SPOF!

Network – Gen Next
19
4
Source: http://www.opencompute.org

Software Stack – Where are We Today
20
5
Compute
Services
Storage
Infrastructure
Services
Hive
(0.13)
Pig
(0.11, 0.14)
Oozie
(4.4)
HDFS Proxy
(3.2)
GDM
(6.4)
YARN
(2.6)
MapReduce
(2.6)
HDFS
(2.6)
HBase
(0.98)
Zookeeper
Grid UI
(SS/Doppler,
Discovery, Hue 3.7)
Monitoring Starling
Messaging
Service
HCatalog
(0.13)
Storm
(0.9.2)
Spark
(1.3)
Tez
(0.6)

Software Stack – Obsess With Use Cases, Not Tech
21
5
HDFS
(File System)
YARN
(Scheduling, Resource Management)
Common
In-
progress,
Unmet
needs or
Apache
Alignment
Platformized
Tech with
Production
Support
RHEL6 64-bit, JDK8

Security and Account Management – Overview
22
6
Grid
Identity,
Authentication and
Authorization
User Id
SSO
Groups, Netgroups, Roles
RPC (GSSAPI)
UI (SPNEGO)

Security and Account Management – Flexibly Secure
23
6
Kerb Realm 2
(Users)
Kerb Realm 1
(Projects, Services)
IdP
SP
CLIENTS
CORP
PROD
Auth
User SSO
Netgroups
Hadoop RPC
Delegation
tokens
Block tokens
Job tokens
Grid

24
7
Acquisition
Replication
(Feeds)Source
Retention
(Policy based
Expiration)
Archival
(Tape Backup)
DataOut
Data Lifecycle
Datastore
Datastore defines a data
source/target (e.g. HDFS)
Dataset
Defines the data flow of a feed
Workflow
Defines a unit of work carried
out by acquisition, replication,
retention servers for moving
an instance of a feed

25
7
MetaStore
Cluster 1 - Colo 1
HDFS
Cluster 2 – Colo 2
HDFS
Grid Data
Management
Feed Acquisition
MetaStore
Feed datasets as
partitioned external
tables
Growl extracts
schema for backfill
HCatClient.
addPartitions(…)
Mark
LOAD_DONE
HCatClient.
addPartitions(…)
Mark
LOAD_DONE
Partitions are dropped with
(HCatClient.dropPartitions(…)) after
retention expiration with a
drop_partition notification
add_partition
event notification
add_partition
event notification
Acquisition
Archival,
Dataout
Retention
Feed
Replication

Metering, Audit, and Governance
26
8
Starling
FS, Job, Task logs
Cluster 1 Cluster 2 Cluster n...
CF, Region, Action, Query Stats
Cluster 1 Cluster 2 Cluster n...
DB, Tbl., Part., Colmn. Access Stats
...MS 1 MS 2 MS n
GDM
Data Defn., Flow, Feed, Source
F 1 F 2 F n
Log Warehouse
Log Sources

Metering, Audit, and Governance
27
8
Data Discovery and Access
Public
Non-sensitive
Financial
Restricted
$
Governance
Classification
No addn. reqmt.
LMS Integration
Stock Admin
Integration
Approval Flow

Integration with External Systems
28
9
BI, Reporting, Transactional DBs
Hadoop Customers
…
DH
Cloud Messaging
Serving Systems
Monitoring, Tools, Portals
Infrastructure in Transition

Debunking Myths
29
10
Hadoop isn’t enterprise ready
Hadoop isn’t stable, clusters go down
You lose data on HDFS
Data cannot be shared across the org
NameNodes do not scale
Software upgrades are rare✗
Hadoop use cases are limited
I need expensive servers to get more
Hadoop is so dead
I need Apache this vs. that
✗
✗
✗
✗
✗
✗
✗
✗
✗

Thank You
@sumeetksingh
@rajivec
Yahoo Kiosk #D5
We are Hiring!

What it takes to run Hadoop at Scale: Yahoo! Perspectives

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à What it takes to run Hadoop at Scale: Yahoo! Perspectives

Similaire à What it takes to run Hadoop at Scale: Yahoo! Perspectives (20)

Plus de DataWorks Summit

Plus de DataWorks Summit (20)

Dernier

Dernier (20)

What it takes to run Hadoop at Scale: Yahoo! Perspectives

Notes de l'éditeur