SlideShare une entreprise Scribd logo
1  sur  27
Hoya: HBase on YARN
Steve Loughran & Devaraj Das
{stevel, ddas} at hortonworks.com
@steveloughran, @ddraj
November 2013

© Hortonworks Inc. 2013
Hadoop as Next-Gen Platform

Single Use System

Multi Purpose Platform

Batch Apps

Batch, Interactive, Online, Streaming, …

HADOOP 1.0

HADOOP 2.0
MapReduce
(data processing)

MapReduce

Others
(data processing)

YARN

(cluster resource management
& data processing)

(cluster resource management)

HDFS

HDFS2

(redundant, reliable storage)

(redundant, reliable storage)

© Hortonworks Inc. 2012

Page 2
YARN: Taking Hadoop Beyond Batch
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
Applications Run Natively IN Hadoop
BATCH
INTERACTIVE IN-MEMORY STREAMING
(MapReduce)
(Tez)
(Spark)
(Storm, S4,…)

GRAPH
(Giraph)

HPC MPI
(OpenMPI)

OTHER
(Search)
(Weave…)

Samza

YARN (Cluster Resource Management)
HDFS2 (Redundant, Reliable Storage)

© Hortonworks Inc.

Page 3
BATCH
INTERACTIVE IN-MEMORY STREAMING
(MapReduce)
(Tez)
(Spark)
(Storm, S4,…)

GRAPH
(Giraph)

HPC MPI
(OpenMPI)

OTHER
(Search)
(Weave…)
HBase

YARN (Cluster Resource Management)
HDFS2 (Redundant, Reliable Storage)

And HBase?
BATCH
INTERACTIVE IN-MEMORY STREAMING
(MapReduce)
(Tez)
(Spark)
(Storm, S4,…)

GRAPH
(Giraph)

HPC MPI
(OpenMPI)

OTHER
(Search)
(Weave…)

YARN (Cluster Resource Management)
HDFS2 (Redundant, Reliable Storage)

HBase
© Hortonworks Inc.

Page 5
Hoya: On-demand HBase clusters
1. Small HBase cluster in large YARN cluster
2. Dynamic HBase clusters

3. Self-healing HBase Cluster
4. Elastic HBase clusters
5. Transient/intermittent clusters for workflows
6. Custom versions & configurations
7. More efficient utilization/sharing of cluster

© Hortonworks Inc.

Page 6
Goal: No code changes in HBase
• Today : none

But we'd like
• ZK reporting of web UI ports
• A way to get from failed RS to YARN container
(configurable ID is enough)

© Hortonworks Inc.

Page 7
Hoya – the tool
• Hoya (Hbase On YArn)
– Java tool
– Completely CLI driven

• Input: cluster description as JSON
– Specification of cluster: node options, ZK params
– Configuration generated
– Entire state persisted

• Actions: create, freeze/thaw, flex, exists <cluster>
• Can change cluster state later
– Add/remove nodes, started / stopped states

© Hortonworks Inc.
YARN manages the cluster
•
•
•
•
•
•

Servers run YARN Node Managers
NM's heartbeat to Resource Manager
RM schedules work over cluster
YARN Node Manager
RM allocates containers to apps
NMs start containers
NMs report container health

YARN Resource Manager

HDFS

HDFS

YARN Node Manager

YARN Node Manager

HDFS

HDFS

© Hortonworks Inc. 2012

Page 9
Hoya Client creates App Master
YARN Resource Manager
YARN Node Manager
Hoya Client

Hoya AM

HDFS

HDFS

YARN Node Manager

YARN Node Manager

HDFS

HDFS

© Hortonworks Inc. 2012

Page 10
AM deploys HBase with YARN
YARN Resource Manager
YARN Node Manager
Hoya Client

Hoya AM

HDFS

HBase Master
HDFS

YARN Node Manager

YARN Node Manager
HBase Region Server

HBase Region Server
HDFS

© Hortonworks Inc. 2012

HDFS

Page 11
HBase & clients bind via Zookeeper
YARN Resource Manager
YARN Node Manager
Hoya Client

Hoya AM

HDFS

HBase Master
HBase Client

YARN Node Manager

HDFS

YARN Node Manager
HBase Region Server

HBase Region Server
HDFS

© Hortonworks Inc. 2012

HDFS

Page 12
YARN notifies AM of failures
YARN Resource Manager
YARN Node Manager
Hoya Client

Hoya AM

HDFS

HBase Master
HDFS

YARN Node Manager

YARN Node Manager

HBase Region Server

HBase Region Server

HBase Region Server
HDFS

© Hortonworks Inc. 2012

HDFS

Page 13
HOYA - cool bits
• Cluster specification stored as JSON in HDFS
• Conf dir cached, dynamically patched before pushing
up as local resources for master & region servers
• HBase .tar file stored in HDFS -clusters can use the
same/different HBase versions
• Handling of cluster flexing is the same code as
unplanned container loss.
• No Hoya code on region servers

© Hortonworks Inc.

Page 14
HOYA - AM RPC API
//change cluster role counts
flexCluster(ClusterSpec)
//get current cluster state
getJSONClusterStatus() : ClusterSpec

listNodeUUIDsByRole(role): UUID[]
getNode(UUID): RoleInfo
getClusterNodes(UUID[]) RoleInfo[]
stopCluster()
© Hortonworks Inc.

Page 15
Flexing/failure handling is same code
boolean flexCluster(ClusterDescription updated) {
providerService.validateClusterSpec(updated);
appState.updateClusterSpec(updated);
return reviewRequestAndReleaseNodes();
}
void onContainersCompleted(List<ContainerStatus> completed) {
for (ContainerStatus status : completed) {
appState.onCompletedNode(status);
}
reviewRequestAndReleaseNodes();
}

© Hortonworks Inc.

Page 16
Cluster Specification: persistent & wire
{
"version" : "1.0",
"name" : "TestLiveTwoNodeRegionService",
"type" : "hbase",
"options" : {
"zookeeper.path" : "/yarnapps_hoya_stevel_live2nodes",
"cluster.application.image.path" : "hdfs://bin/hbase-0.96.tar.gz",
"zookeeper.hosts" : "127.0.0.1"
},
"roles" : {
"worker" : {
"role.instances" : "2",
},
"hoya" : {
"role.instances" : "1",
},
"master" : {
"role.instances" : "1",
}
},
...
}

© Hortonworks Inc. 2012
Role Specifications
"roles" : {
"worker" : {
"yarn.memory" : "256",
"role.instances" : "5",
"jvm.heapsize" : "256",
"yarn.vcores" : "1",
"app.infoport" : "0"
"env.MALLOC_ARENA_MAX": "4"
},
"master" : {
"yarn.memory" : "128",
"role.instances" : "1",
"jvm.heapsize" : "128",
"yarn.vcores" : "1",
"app.infoport" : "8585"
}
}

© Hortonworks Inc. 2012
Current status
• HBase clusters on-demand
• Accumulo clusters (5+ roles, different “provider”)

• Cluster freeze, thaw, flex, destroy
• Location of role instances tracked & persisted
–for placement close to data after failure, thaw
• Secure cluster support

© Hortonworks Inc.
Ongoing
• Multiple roles: worker, master, monitor
--role worker --roleopts worker yarn.vcores 2

• Multiple Providers: HBase + others
– client side: preflight, configuration patching
– server side: starting roles, liveness

• Liveness probes: HTTP GET, RPC port, RPC op?
• What do we need in YARN for production?

© Hortonworks Inc.

Page 20
Ongoing
• Better failure handling, blacklisting
• Liveness probes: HTTP GET, RPC port, RPC op?

• Testing: functional, scale & load
• What do we need in Hoya for production?
• What do we need in YARN for production?

© Hortonworks Inc.

Page 21
Requirements of an App: MUST
• Install from tarball; run as normal user
• Deploy/start without human intervention
• Pre-configurable, static instance config data
• Support dynamic discovery/binding of peers
• Co-existence with other app instance in cluster/nodes
• Handle co-located role instances
• Persist data to HDFS
• Support 'kill' as a shutdown option
• Handle failed role instances
• Support role instances moving after failure

© Hortonworks Inc.

Page 22
Requirements of an App: SHOULD
• Be configurable by Hadoop XML files
• Publish dynamically assigned web UI & RPC ports
• Support cluster flexing up/down
• Support API to determine role instance status
• Make it possible to determine role instance ID from
app
• Support simple remote liveness probes

© Hortonworks Inc.

Page 23
YARN-896: long-lived services
1. Container reconnect on AM restart
2. YARN Token renewal on long-lived apps

3. Containers: signalling, >1 process sequence
4. AM/RM managed gang scheduling
5. Anti-affinity hint in container requests
6. Service Registry - ZK?
7. Logging

© Hortonworks Inc.

Page 24
SLAs & co-existence with MapReduce
1. Make IO bandwidth/IOPs a resource used in
scheduling & limits

2. Need to monitor what's going on w.r.t IO & net load
from containers  apps  queues
3. Dynamic adaptation of cgroup HDD, Net, RAM limits
4. Could we throttle MR job File & HDFS IO
bandwidth?

© Hortonworks Inc.

Page 25
Hoya needs a home!

https://github.com/hortonworks/hoya
© Hortonworks Inc.

Page 26
Questions?
hortonworks.com

© Hortonworks Inc

Page 27

Contenu connexe

Tendances

Galera Cluster: Synchronous Multi-Master Replication for MySQL HA
Galera Cluster: Synchronous Multi-Master Replication for MySQL HAGalera Cluster: Synchronous Multi-Master Replication for MySQL HA
Galera Cluster: Synchronous Multi-Master Replication for MySQL HALudovico Caldara
 
Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application clusterSatishbabu Gunukula
 
Bring your Service to YARN
Bring your Service to YARNBring your Service to YARN
Bring your Service to YARNDataWorks Summit
 
Policy based cluster management in oracle 12c
Policy based cluster management in oracle 12c Policy based cluster management in oracle 12c
Policy based cluster management in oracle 12c Anju Garg
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprisesnvvrajesh
 
Boost your Oracle RAC manageability with Policy-Managed Databases
Boost your Oracle RAC manageability with Policy-Managed DatabasesBoost your Oracle RAC manageability with Policy-Managed Databases
Boost your Oracle RAC manageability with Policy-Managed DatabasesLudovico Caldara
 
Oracle RAC 12c and Policy-Managed Databases, a Technical Overview
Oracle RAC 12c and Policy-Managed Databases, a Technical OverviewOracle RAC 12c and Policy-Managed Databases, a Technical Overview
Oracle RAC 12c and Policy-Managed Databases, a Technical OverviewLudovico Caldara
 
D17316 gc20 l06_dataprot_logtrans
D17316 gc20 l06_dataprot_logtransD17316 gc20 l06_dataprot_logtrans
D17316 gc20 l06_dataprot_logtransMoeen_uddin
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013alanfgates
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayDataWorks Summit
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesGustavo Rene Antunez
 
SQLIO - measuring storage performance
SQLIO - measuring storage performanceSQLIO - measuring storage performance
SQLIO - measuring storage performancevalerian_ceaus
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsGuy Harrison
 
Oracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptOracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptSantosh Kangane
 
Oracle 12c Multi Tenant
Oracle 12c Multi TenantOracle 12c Multi Tenant
Oracle 12c Multi TenantRed Stack Tech
 
Oracle database high availability solutions
Oracle database high availability solutionsOracle database high availability solutions
Oracle database high availability solutionsKirill Loifman
 
Oracle 12c PDB insights
Oracle 12c PDB insightsOracle 12c PDB insights
Oracle 12c PDB insightsKirill Loifman
 
D17316 gc20 l05_phys_sql
D17316 gc20 l05_phys_sqlD17316 gc20 l05_phys_sql
D17316 gc20 l05_phys_sqlMoeen_uddin
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )varasteh65
 

Tendances (20)

Galera Cluster: Synchronous Multi-Master Replication for MySQL HA
Galera Cluster: Synchronous Multi-Master Replication for MySQL HAGalera Cluster: Synchronous Multi-Master Replication for MySQL HA
Galera Cluster: Synchronous Multi-Master Replication for MySQL HA
 
Convert single instance to RAC
Convert single instance to RACConvert single instance to RAC
Convert single instance to RAC
 
Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application cluster
 
Bring your Service to YARN
Bring your Service to YARNBring your Service to YARN
Bring your Service to YARN
 
Policy based cluster management in oracle 12c
Policy based cluster management in oracle 12c Policy based cluster management in oracle 12c
Policy based cluster management in oracle 12c
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
 
Boost your Oracle RAC manageability with Policy-Managed Databases
Boost your Oracle RAC manageability with Policy-Managed DatabasesBoost your Oracle RAC manageability with Policy-Managed Databases
Boost your Oracle RAC manageability with Policy-Managed Databases
 
Oracle RAC 12c and Policy-Managed Databases, a Technical Overview
Oracle RAC 12c and Policy-Managed Databases, a Technical OverviewOracle RAC 12c and Policy-Managed Databases, a Technical Overview
Oracle RAC 12c and Policy-Managed Databases, a Technical Overview
 
D17316 gc20 l06_dataprot_logtrans
D17316 gc20 l06_dataprot_logtransD17316 gc20 l06_dataprot_logtrans
D17316 gc20 l06_dataprot_logtrans
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native way
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databases
 
SQLIO - measuring storage performance
SQLIO - measuring storage performanceSQLIO - measuring storage performance
SQLIO - measuring storage performance
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other tools
 
Oracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptOracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and concept
 
Oracle 12c Multi Tenant
Oracle 12c Multi TenantOracle 12c Multi Tenant
Oracle 12c Multi Tenant
 
Oracle database high availability solutions
Oracle database high availability solutionsOracle database high availability solutions
Oracle database high availability solutions
 
Oracle 12c PDB insights
Oracle 12c PDB insightsOracle 12c PDB insights
Oracle 12c PDB insights
 
D17316 gc20 l05_phys_sql
D17316 gc20 l05_phys_sqlD17316 gc20 l05_phys_sql
D17316 gc20 l05_phys_sql
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )
 

En vedette

Taming Deployment With Smart Frog
Taming Deployment With Smart FrogTaming Deployment With Smart Frog
Taming Deployment With Smart FrogSteve Loughran
 
Economic Scheduling of Hadoop Jobs
Economic Scheduling of Hadoop JobsEconomic Scheduling of Hadoop Jobs
Economic Scheduling of Hadoop JobsSteve Loughran
 
Farms, Fabrics and Clouds
Farms, Fabrics and CloudsFarms, Fabrics and Clouds
Farms, Fabrics and CloudsSteve Loughran
 
Battle At Goliad
Battle At GoliadBattle At Goliad
Battle At Goliadcompd
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceSteve Loughran
 
My other computer_is_a_datacentre
My other computer_is_a_datacentreMy other computer_is_a_datacentre
My other computer_is_a_datacentreSteve Loughran
 
A New Approach To Organization
A New Approach To OrganizationA New Approach To Organization
A New Approach To Organizationcompd
 
Did you really want that data?
Did you really want that data?Did you really want that data?
Did you really want that data?Steve Loughran
 

En vedette (13)

Taming Deployment With Smart Frog
Taming Deployment With Smart FrogTaming Deployment With Smart Frog
Taming Deployment With Smart Frog
 
Economic Scheduling of Hadoop Jobs
Economic Scheduling of Hadoop JobsEconomic Scheduling of Hadoop Jobs
Economic Scheduling of Hadoop Jobs
 
Farms, Fabrics and Clouds
Farms, Fabrics and CloudsFarms, Fabrics and Clouds
Farms, Fabrics and Clouds
 
H is for_hadoop
H is for_hadoopH is for_hadoop
H is for_hadoop
 
Battle At Goliad
Battle At GoliadBattle At Goliad
Battle At Goliad
 
Extended essay overview
Extended essay overviewExtended essay overview
Extended essay overview
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG France
 
My other computer_is_a_datacentre
My other computer_is_a_datacentreMy other computer_is_a_datacentre
My other computer_is_a_datacentre
 
A New Approach To Organization
A New Approach To OrganizationA New Approach To Organization
A New Approach To Organization
 
Graphs
GraphsGraphs
Graphs
 
Scholarly articles
Scholarly articlesScholarly articles
Scholarly articles
 
Echolocation
EcholocationEcholocation
Echolocation
 
Did you really want that data?
Did you really want that data?Did you really want that data?
Did you really want that data?
 

Similaire à 2013 11-19-hoya-status

How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNHortonworks
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARNSteve Loughran
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Hadoop ha system admin
Hadoop ha system adminHadoop ha system admin
Hadoop ha system adminTrieu Dao Minh
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Big Data Joe™ Rossi
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveHortonworks
 

Similaire à 2013 11-19-hoya-status (20)

How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
YARN Services
YARN ServicesYARN Services
YARN Services
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
HBase with MapR
HBase with MapRHBase with MapR
HBase with MapR
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Hadoop ha system admin
Hadoop ha system adminHadoop ha system admin
Hadoop ha system admin
 
October 2014 HUG : Apache Slider
October 2014 HUG : Apache SliderOctober 2014 HUG : Apache Slider
October 2014 HUG : Apache Slider
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
 

Plus de Steve Loughran

The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is overSteve Loughran
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionSteve Loughran
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!Steve Loughran
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()Steve Loughran
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming DeployedSteve Loughran
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveSteve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupSteve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSteve Loughran
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresSteve Loughran
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object StoresSteve Loughran
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Steve Loughran
 

Plus de Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
 
Testing
TestingTesting
Testing
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Datacentre stack
Datacentre stackDatacentre stack
Datacentre stack
 
Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!
 

Dernier

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

2013 11-19-hoya-status

  • 1. Hoya: HBase on YARN Steve Loughran & Devaraj Das {stevel, ddas} at hortonworks.com @steveloughran, @ddraj November 2013 © Hortonworks Inc. 2013
  • 2. Hadoop as Next-Gen Platform Single Use System Multi Purpose Platform Batch Apps Batch, Interactive, Online, Streaming, … HADOOP 1.0 HADOOP 2.0 MapReduce (data processing) MapReduce Others (data processing) YARN (cluster resource management & data processing) (cluster resource management) HDFS HDFS2 (redundant, reliable storage) (redundant, reliable storage) © Hortonworks Inc. 2012 Page 2
  • 3. YARN: Taking Hadoop Beyond Batch Store ALL DATA in one place… Interact with that data in MULTIPLE WAYS with Predictable Performance and Quality of Service Applications Run Natively IN Hadoop BATCH INTERACTIVE IN-MEMORY STREAMING (MapReduce) (Tez) (Spark) (Storm, S4,…) GRAPH (Giraph) HPC MPI (OpenMPI) OTHER (Search) (Weave…) Samza YARN (Cluster Resource Management) HDFS2 (Redundant, Reliable Storage) © Hortonworks Inc. Page 3
  • 4. BATCH INTERACTIVE IN-MEMORY STREAMING (MapReduce) (Tez) (Spark) (Storm, S4,…) GRAPH (Giraph) HPC MPI (OpenMPI) OTHER (Search) (Weave…) HBase YARN (Cluster Resource Management) HDFS2 (Redundant, Reliable Storage) And HBase? BATCH INTERACTIVE IN-MEMORY STREAMING (MapReduce) (Tez) (Spark) (Storm, S4,…) GRAPH (Giraph) HPC MPI (OpenMPI) OTHER (Search) (Weave…) YARN (Cluster Resource Management) HDFS2 (Redundant, Reliable Storage) HBase
  • 6. Hoya: On-demand HBase clusters 1. Small HBase cluster in large YARN cluster 2. Dynamic HBase clusters 3. Self-healing HBase Cluster 4. Elastic HBase clusters 5. Transient/intermittent clusters for workflows 6. Custom versions & configurations 7. More efficient utilization/sharing of cluster © Hortonworks Inc. Page 6
  • 7. Goal: No code changes in HBase • Today : none But we'd like • ZK reporting of web UI ports • A way to get from failed RS to YARN container (configurable ID is enough) © Hortonworks Inc. Page 7
  • 8. Hoya – the tool • Hoya (Hbase On YArn) – Java tool – Completely CLI driven • Input: cluster description as JSON – Specification of cluster: node options, ZK params – Configuration generated – Entire state persisted • Actions: create, freeze/thaw, flex, exists <cluster> • Can change cluster state later – Add/remove nodes, started / stopped states © Hortonworks Inc.
  • 9. YARN manages the cluster • • • • • • Servers run YARN Node Managers NM's heartbeat to Resource Manager RM schedules work over cluster YARN Node Manager RM allocates containers to apps NMs start containers NMs report container health YARN Resource Manager HDFS HDFS YARN Node Manager YARN Node Manager HDFS HDFS © Hortonworks Inc. 2012 Page 9
  • 10. Hoya Client creates App Master YARN Resource Manager YARN Node Manager Hoya Client Hoya AM HDFS HDFS YARN Node Manager YARN Node Manager HDFS HDFS © Hortonworks Inc. 2012 Page 10
  • 11. AM deploys HBase with YARN YARN Resource Manager YARN Node Manager Hoya Client Hoya AM HDFS HBase Master HDFS YARN Node Manager YARN Node Manager HBase Region Server HBase Region Server HDFS © Hortonworks Inc. 2012 HDFS Page 11
  • 12. HBase & clients bind via Zookeeper YARN Resource Manager YARN Node Manager Hoya Client Hoya AM HDFS HBase Master HBase Client YARN Node Manager HDFS YARN Node Manager HBase Region Server HBase Region Server HDFS © Hortonworks Inc. 2012 HDFS Page 12
  • 13. YARN notifies AM of failures YARN Resource Manager YARN Node Manager Hoya Client Hoya AM HDFS HBase Master HDFS YARN Node Manager YARN Node Manager HBase Region Server HBase Region Server HBase Region Server HDFS © Hortonworks Inc. 2012 HDFS Page 13
  • 14. HOYA - cool bits • Cluster specification stored as JSON in HDFS • Conf dir cached, dynamically patched before pushing up as local resources for master & region servers • HBase .tar file stored in HDFS -clusters can use the same/different HBase versions • Handling of cluster flexing is the same code as unplanned container loss. • No Hoya code on region servers © Hortonworks Inc. Page 14
  • 15. HOYA - AM RPC API //change cluster role counts flexCluster(ClusterSpec) //get current cluster state getJSONClusterStatus() : ClusterSpec listNodeUUIDsByRole(role): UUID[] getNode(UUID): RoleInfo getClusterNodes(UUID[]) RoleInfo[] stopCluster() © Hortonworks Inc. Page 15
  • 16. Flexing/failure handling is same code boolean flexCluster(ClusterDescription updated) { providerService.validateClusterSpec(updated); appState.updateClusterSpec(updated); return reviewRequestAndReleaseNodes(); } void onContainersCompleted(List<ContainerStatus> completed) { for (ContainerStatus status : completed) { appState.onCompletedNode(status); } reviewRequestAndReleaseNodes(); } © Hortonworks Inc. Page 16
  • 17. Cluster Specification: persistent & wire { "version" : "1.0", "name" : "TestLiveTwoNodeRegionService", "type" : "hbase", "options" : { "zookeeper.path" : "/yarnapps_hoya_stevel_live2nodes", "cluster.application.image.path" : "hdfs://bin/hbase-0.96.tar.gz", "zookeeper.hosts" : "127.0.0.1" }, "roles" : { "worker" : { "role.instances" : "2", }, "hoya" : { "role.instances" : "1", }, "master" : { "role.instances" : "1", } }, ... } © Hortonworks Inc. 2012
  • 18. Role Specifications "roles" : { "worker" : { "yarn.memory" : "256", "role.instances" : "5", "jvm.heapsize" : "256", "yarn.vcores" : "1", "app.infoport" : "0" "env.MALLOC_ARENA_MAX": "4" }, "master" : { "yarn.memory" : "128", "role.instances" : "1", "jvm.heapsize" : "128", "yarn.vcores" : "1", "app.infoport" : "8585" } } © Hortonworks Inc. 2012
  • 19. Current status • HBase clusters on-demand • Accumulo clusters (5+ roles, different “provider”) • Cluster freeze, thaw, flex, destroy • Location of role instances tracked & persisted –for placement close to data after failure, thaw • Secure cluster support © Hortonworks Inc.
  • 20. Ongoing • Multiple roles: worker, master, monitor --role worker --roleopts worker yarn.vcores 2 • Multiple Providers: HBase + others – client side: preflight, configuration patching – server side: starting roles, liveness • Liveness probes: HTTP GET, RPC port, RPC op? • What do we need in YARN for production? © Hortonworks Inc. Page 20
  • 21. Ongoing • Better failure handling, blacklisting • Liveness probes: HTTP GET, RPC port, RPC op? • Testing: functional, scale & load • What do we need in Hoya for production? • What do we need in YARN for production? © Hortonworks Inc. Page 21
  • 22. Requirements of an App: MUST • Install from tarball; run as normal user • Deploy/start without human intervention • Pre-configurable, static instance config data • Support dynamic discovery/binding of peers • Co-existence with other app instance in cluster/nodes • Handle co-located role instances • Persist data to HDFS • Support 'kill' as a shutdown option • Handle failed role instances • Support role instances moving after failure © Hortonworks Inc. Page 22
  • 23. Requirements of an App: SHOULD • Be configurable by Hadoop XML files • Publish dynamically assigned web UI & RPC ports • Support cluster flexing up/down • Support API to determine role instance status • Make it possible to determine role instance ID from app • Support simple remote liveness probes © Hortonworks Inc. Page 23
  • 24. YARN-896: long-lived services 1. Container reconnect on AM restart 2. YARN Token renewal on long-lived apps 3. Containers: signalling, >1 process sequence 4. AM/RM managed gang scheduling 5. Anti-affinity hint in container requests 6. Service Registry - ZK? 7. Logging © Hortonworks Inc. Page 24
  • 25. SLAs & co-existence with MapReduce 1. Make IO bandwidth/IOPs a resource used in scheduling & limits 2. Need to monitor what's going on w.r.t IO & net load from containers  apps  queues 3. Dynamic adaptation of cgroup HDD, Net, RAM limits 4. Could we throttle MR job File & HDFS IO bandwidth? © Hortonworks Inc. Page 25
  • 26. Hoya needs a home! https://github.com/hortonworks/hoya © Hortonworks Inc. Page 26

Notes de l'éditeur

  1. &quot;hoya&quot; is actually the Hoya AM: it lets us define the memory and requirements of that node in the cluster
  2. JMX port binding &amp; publishing of portweb port rolling restart of NM/RMAM retry logicTesting: chaos monkey for YARNLogging: running Samza without HDFS -costs of ops &amp; latency. Are only running Samza clusters w/ YARN. YARN puts logs into HDFS by default, so without HDFS you are stuffed.-rollover of stdout &amp; stderr -have the NM implement the rolling. -Samza could log from log4j to append to Kafka; need a way to pull out and view. Adds 15 min YARN can use http: URLs to pull in local resourceSamza can handle outages of a few minutes, but for other services rolling restart/upgadeno classic scheduling; assume full code can run or buy more hardware-RM to tell AM when request cant be satisfied
  3. co-existenc