SlideShare une entreprise Scribd logo
1  sur  74
Télécharger pour lire hors ligne
Apache Helix: Simplifying
Distributed Systems
Kanak Biscuitwala and Jason Zhang
helix.incubator.apache.org
@apachehelix
Outline
•
•
•
•
•
•
•

Background
Resource Assignment Problem
Helix Concepts
Putting Concepts to Work
Getting Started
Plugins
Current Status
Load balancing

Responding to
node entry and
exit

Building distributed systems is hard.
Alerting based
on metrics

Managing data
replicas

Supporting
event listeners
Load balancing

Responding to
node entry and
exit

Helix abstracts away problems distributed
systems need to solve.
Alerting based
on metrics

Managing data
replicas

Supporting
event listeners
System Lifecycle
Cluster Expansion

Fault Tolerance
Multi-Node
Partitioning
Discovery
Co-Location

Single Node

Replication
Fault Detection
Recovery

Throttle movement
Redistribute data
Resource Assignment Problem
Resource Assignment Problem
RESOURCES

NODES
Resource Assignment Problem
Sample Allocation
RESOURCES

NODES
25%

25%

25%

25%
Resource Assignment Problem
Failure Handling
RESOURCES

NODES

33%

33%

34%
Resource Assignment Problem
Making it Work: Take 1 (ZooKeeper)
Application
Application

Helix
ZooKeeper
File system
Lock
Ephemeral

ZooKeeper provides low-level primitives

Node
Partition
Replica
State
Transition

Consensus
System
We need high-level primitives
Resource Assignment Problem
Making it Work: Take 2 (Decisions by Nodes)

S
config changes
node changes

node updates

S

Consensus
System

S
S

service running on a node

S
Resource Assignment Problem
Making it Work: Take 2 (Decisions by Nodes)

S
config changes
node changes

node updates

S

Consensus
System

multiple brains
S

app-specific logic
unscalable traffic

S
Resource Assignment Problem
Making it Work: Take 2 (Decisions by Nodes)

S
config changes
node changes

node updates

S

Consensus
System

multiple brains
S

app-specific logic
unscalable traffic

S
Resource Assignment Problem
Making it Work: Take 3 (Single Brain)
S
node updates
node updates
Controller

config changes
node changes

Consensus
System

S

Node logic is drastically simplified!

S
Resource Assignment Problem
Helix View
RESOURCES

Controller
Controller
Controller

Manage

NODES (Participants)

Spectators
Resource Assignment Problem
Helix View
RESOURCES

Controller
Controller
Controller

Manage

NODES (Participants)

Spectators

Question: How do we make this controller generic
enough to work for different resources?
Helix Concepts
Helix Concepts
Resources

Resource

Partition

Partition

Partition

All partitions can be replicated.
Helix Concepts
Declarative State Model

Offline

Slave

Master
Helix Concepts
Constraints: Augmenting the State Model
State Constraints
MASTER: [1, 1]
SLAVE: [0, R]
Special Constraint Values
R: Replica count per partition
N: Number of participants
Helix Concepts
Constraints: Augmenting the State Model
State Constraints
MASTER: [1, 1]
SLAVE: [0, R]
Special Constraint Values
R: Replica count per partition
N: Number of participants

Transition Constraints
Scope: Cluster
OFFLINE-SLAVE: 3 concurrent
Scope: Resource R1
SLAVE-MASTER: 1 concurrent
Scope: Participant P4
OFFLINE-SLAVE: 2 concurrent
Helix Concepts
Constraints: Augmenting the State Model
State Constraints
MASTER: [1, 1]
SLAVE: [0, R]
Special Constraint Values
R: Replica count per partition
N: Number of participants

Transition Constraints
Scope: Cluster
OFFLINE-SLAVE: 3 concurrent
Scope: Resource R1
SLAVE-MASTER: 1 concurrent
Scope: Participant P4
OFFLINE-SLAVE: 2 concurrent

States and transitions are ordered by priority in computing replica states.
Transition constraints can be restricted to cluster, resource, and
participant scopes. The most restrictive constraint is used.
Helix Concepts
Resources and the Augmented State Model
master
Resource

slave
offline

Partition

Partition

Partition

All partitions can be replicated.
Each replica is in a state governed by the augmented state model.
Helix Concepts
Objectives

Partition Placement
Distribution policy for partitions and replicas
Making effective use of the cluster and the resource

Failure and Expansion Semantics
Create new replicas and assign states
Changing existing replica states
Putting Concepts to Work
Rebalancing Strategies
Meeting Objectives within Constraints

Full-Auto

Replica
Placement

Replica
State

Helix

Helix

Semi-Auto

App

Helix

Customized

User-Defined

App

App code
plugged into
the Helix
controller

App

App code
plugged into
the Helix
controller
Rebalancing Strategies
Full-Auto
Node 1

Node 2

Node 3

P1: M

P2: M

P3: M

P2: S

P3: S

P1: S

By default, Helix optimizes for minimal movement and even
distribution of partitions and states
Rebalancing Strategies
Full-Auto
Node 1

Node 2

Node 3

P1: M

P2: M

P3: M

P2: S

P3: S

P1: S

By default, Helix optimizes for minimal movement and even
distribution of partitions and states
Rebalancing Strategies
Full-Auto
Node 1

Node 2

P1: M

P2: M

P2: S

P3: S

P3: M

Node 3

P1: S

By default, Helix optimizes for minimal movement and even
distribution of partitions and states
Rebalancing Strategies
Semi-Auto
Node 1

Node 2

Node 3

P1: M

P2: M

P3: M

P2: S

P3: S

P1: S

Semi-Auto mode maintains the location of the replicas, but
allows Helix to adjust the states to follow the state constraints.
This is ideal for resources that are expensive to move.
Rebalancing Strategies
Semi-Auto
Node 1

Node 2

Node 3

P1: M

P2: M

P3: M

P2: S

P3: S

P1: S

Semi-Auto mode maintains the location of the replicas, but
allows Helix to adjust the states to follow the state constraints.
This is ideal for resources that are expensive to move.
Rebalancing Strategies
Semi-Auto
Node 1

Node 2

Node 3

P1: M

P2: M

P3: M

P2: S

P3: M
P3: S

P1: S

Semi-Auto mode maintains the location of the replicas, but
allows Helix to adjust the states to follow the state constraints.
This is ideal for resources that are expensive to move.
Rebalancing Strategies
Customized

The app specifies the location and state of
each replica. Helix still ensures that transitions
are fired according to constraints.
Rebalancing Strategies
Customized

The app specifies the location and state of
each replica. Helix still ensures that transitions
are fired according to constraints.
Need to respond to node changes? Use the
Helix custom code invoker to run on one
participant, or...
Rebalancing Strategies
User-Defined

Node joins or
leaves the cluster

Helix
controller
invokes code
plugged in by the
app

Rebalancer
implemented by
app computes
replica placement
and state

Helix fires
transitions without
violating
constraints

The rebalancer receives a full snapshot of the current cluster
state, as well as access to the backing data store. Helix
rebalancers implement the same interface.
Rebalancing Strategies
User-Defined: Distributed Lock Manager
Node 1

Released

Node 2

Offline

Locked

Each lock is a partition!
Rebalancing Strategies
User-Defined: Distributed Lock Manager
Node 1

Node 3

Released

Node 2

Offline

Locked

Each lock is a partition!
Rebalancing Strategies
User-Defined: Distributed Lock Manager
public	
  ResourceAssignment	
  computeResourceMapping(
	
  	
  	
  	
  Resource	
  resource,	
  IdealState	
  currentIdealState,
	
  	
  	
  	
  CurrentStateOutput	
  currentStateOutput,
	
  	
  	
  	
  ClusterDataCache	
  clusterData)	
  {
	
  	
  ...
	
  	
  int	
  i	
  =	
  0;
	
  	
  for	
  (Partition	
  partition	
  :	
  resource.getPartitions())	
  {
	
  	
  	
  	
  Map<String,	
  String>	
  replicaMap	
  =	
  new	
  HashMap<String,	
  String>();
	
  	
  	
  	
  int	
  participantIndex	
  =	
  i	
  %	
  liveParticipants.size();
	
  	
  	
  	
  String	
  participant	
  =	
  liveParticipants.get(participantIndex);
	
  	
  	
  	
  replicaMap.put(participant,	
  “LOCKED”);
	
  	
  	
  	
  assignment.addReplicaMap(partition,	
  replicaMap);
	
  	
  	
  	
  i++;
	
  	
  }
	
  	
  return	
  assignment;
}
Rebalancing Strategies
User-Defined: Distributed Lock Manager
public	
  ResourceAssignment	
  computeResourceMapping(
	
  	
  	
  	
  Resource	
  resource,	
  IdealState	
  currentIdealState,
	
  	
  	
  	
  CurrentStateOutput	
  currentStateOutput,
	
  	
  	
  	
  ClusterDataCache	
  clusterData)	
  {
	
  	
  ...
	
  	
  int	
  i	
  =	
  0;
	
  	
  for	
  (Partition	
  partition	
  :	
  resource.getPartitions())	
  {
	
  	
  	
  	
  Map<String,	
  String>	
  replicaMap	
  =	
  new	
  HashMap<String,	
  String>();
	
  	
  	
  	
  int	
  participantIndex	
  =	
  i	
  %	
  liveParticipants.size();
	
  	
  	
  	
  String	
  participant	
  =	
  liveParticipants.get(participantIndex);
	
  	
  	
  	
  replicaMap.put(participant,	
  “LOCKED”);
	
  	
  	
  	
  assignment.addReplicaMap(partition,	
  replicaMap);
	
  	
  	
  	
  i++;
	
  	
  }
	
  	
  return	
  assignment;
}
Controller
Fault Tolerance

Offline

Standby

Leader

The augmented state model concept
applies to controllers too!
Controller
Scalability
Controller 1

Cluster 1
Cluster 2
Cluster 3

Controller 2
Cluster 4
Cluster 5
Controller 3

Cluster 6
Controller
Scalability
Controller 1

Cluster 1
Cluster 2
Cluster 3

Controller 2
Cluster 4
Cluster 5
Controller 3

Cluster 6
ZooKeeper View
Ideal State
P1

P2

N1: M

N2: M

N2: S

N1: S

Replica
Placement
and State

{
	
  	
  "id"	
  :	
  "SampleResource",
	
  	
  "simpleFields"	
  :	
  {
	
  	
  	
  	
  "REBALANCE_MODE"	
  :	
  "USER_DEFINED",
	
  	
  	
  	
  "NUM_PARTITIONS"	
  :	
  "2",
	
  	
  	
  	
  "REPLICAS"	
  :	
  "2",
	
  	
  	
  	
  "STATE_MODEL_DEF_REF"	
  :	
  "MasterSlave",
	
  	
  	
  	
  "STATE_MODEL_FACTORY_NAME"	
  :	
  "DEFAULT"
	
  	
  },
	
  	
  "mapFields"	
  :	
  {
	
  	
  	
  	
  "SampleResource_0"	
  :	
  {
	
  	
  	
  	
  	
  	
  "node1_12918"	
  :	
  "MASTER",
	
  	
  	
  	
  	
  	
  "node2_12918"	
  :	
  "SLAVE"
	
  	
  	
  	
  }
	
  	
  	
  	
  ...
	
  	
  },
	
  	
  "listFields"	
  :	
  {}
}
ZooKeeper View
Current State and External View
External View
Current State

P1

P2

N1

P1: MASTER
P2: MASTER

N1: M

N1: M

N2

P1: OFFLINE
P2: OFFLINE

N2: O

N2: O

Helix’s responsibility is to make the external view
match the ideal state as closely as possible
Logical Deployment
Spectator
Helix Agent

ZooKeeper

Helix Controller

Helix Agent

Helix Agent

Helix Agent

P1: M

P2: S

Participant

P2: M

P3: S

Participant

P3: M

P1: S

Participant
Getting Started
Example: Distributed Data Store
Master

P.1

P.2

P.3

P.5

P.6

P.7

P.9

P.10

P.11

Slave

P.4

P.5

P.6

P.8

P.1

P.2

P.12

P.3

P.4

P.9

P.10

P.11

P.12

P.7

P.8

Node 1
Partition
Management
• multiple replicas
• 1 master
• even distribution

Node 2
Fault Tolerance
• fault detection
• promote master to
slave
• even distribution
• no SPOF

Node 3
Elasticity
• minimize downtime
• minimize data
movement
• throttle movement
Example: Distributed Data Store
Helix-Based Solution

Define
state model
state transitions

Configure
create cluster
add nodes
add resource
config rebalancer

Run
start controller
start participants
Example: Distributed Data Store
State Model Definition: Master-Slave

States
all possible states
priority
Transitions
legal transitions
priority
Applicable to each partition
of a resource

Slave

Offline

Master
Example: Distributed Data Store
State Model Definition: Master-Slave
builder	
  =	
  new	
  
StateModelDefinition.Builder(“MasterSlave”);
	
  //	
  add	
  states	
  and	
  their	
  ranks	
  to	
  indicate	
  priority
	
  builder.addState(MASTER,	
  1);
	
  builder.addState(SLAVE,	
  2);
	
  builder.addState(OFFLINE);
	
  //	
  set	
  the	
  initial	
  state	
  when	
  participant	
  starts
	
  builder.initialState(OFFLINE);
//	
  add	
  transitions
	
  builder.addTransition(OFFLINE,	
  SLAVE);
	
  builder.addTransition(SLAVE,	
  OFFLINE);
	
  builder.addTransition(SLAVE,	
  MASTER);
	
  builder.addTransition(MASTER,	
  SLAVE);
Example: Distributed Data Store
Defining Constraints
StateCount=2
State

Transition

Y

Y

Resource -

Y

Node

Y

Y

Cluster

-

Y

Partition

Slave

Offline

Master

StateCount=1
Example: Distributed Data Store
Defining Constraints: Code
//	
  static	
  constraints
	
  builder.upperBound(MASTER,	
  1);
	
  //	
  dynamic	
  constraints
	
  builder.dynamicUpperBound(SLAVE,	
  “R”);
	
  //	
  unconstrained
	
  builder.upperBound(OFFLINE,	
  -­‐1);
Example: Distributed Data Store
Participant Plug-In Code
@StateModelInfo(initialState=“OFFLINE”,	
  states={“OFFLINE”,	
  
“SLAVE”,	
  “MASTER”})
class	
  DistributedDataStoreModel	
  extends	
  StateModel	
  {
	
  	
  @Transition(from=“OFFLINE”,	
  to=“SLAVE”)
	
  	
  public	
  void	
  fromOfflineToSlave(Message	
  m,	
  NotificationContext	
  
ctx)	
  {
	
  	
  	
  	
  //	
  bootstrap	
  data,	
  setup	
  replication,	
  etc.
	
  	
  }
	
  	
  
	
  	
  @Transition(from=“SLAVE”,	
  to=“MASTER”)
	
  	
  public	
  void	
  fromSlaveToMaster(Message	
  m,	
  NotificationContext	
  
ctx)	
  {
	
  	
  	
  	
  //	
  catch	
  up	
  previous	
  master,	
  enable	
  writes,	
  etc.
	
  	
  }
	
  	
  ...
}
Example: Distributed Data Store
Configure and Run
HelixAdmin	
  -­‐zkSvr	
  <zk-­‐address>
Create Cluster
-­‐-­‐	
  addCluster	
  MyCluster
Add Participants
-­‐-­‐	
  addNode	
  MyCluster	
  localhost_12000
...
Add Resource
-­‐-­‐	
  addResource	
  MyDB	
  16	
  MasterSlave	
  SEMI_AUTO
Configure Rebalancer
-­‐-­‐	
  rebalance	
  MyDB	
  3
Example: Distributed Data Store
Spectator Plug-In Code
class	
  RoutingLogic	
  {
	
  	
  	
  public	
  void	
  write(Request	
  request)	
  {
	
  	
  	
  	
  	
  	
  partition	
  =	
  getPartition(request.key);
	
  	
  	
  	
  	
  	
  List<Node>	
  nodes	
  =	
  
routingTableProvider.getInstance(partition,	
  “MASTER”);
	
  	
  	
  	
  	
  	
  nodes.get(0).write(request);
	
  	
  	
  }
	
  	
  	
  public	
  void	
  read(Request	
  request)	
  {
	
  	
  	
  	
  	
  	
  partition	
  =	
  getPartition(request.key);
	
  	
  	
  	
  	
  	
  List<Node>	
  nodes	
  =	
  
routingTableProvider.getInstance(partition);
	
  	
  	
  	
  	
  	
  random(nodes).read(request);
	
  	
  	
  }
Example: Distributed Data Store
Where is the Code?
Participant

Participant
Plug-In
Code

node updates
node updates
Controller

config changes
node changes

Consensus
System

Spectator

Participant
Plug-In
Code
Participant

Spectator
Plug-In
Code
Example: Distributed Search
Index
shard

P.1

P.2

P.3

P.4

P.3

P.4

P.5

P.6
Node 1

Partition
Management
• multiple replicas
• rack-aware
placement
• even distribution

P.5

P.6

P.1

P.2
Node 2

Fault Tolerance
• fault detection
• auto create replicas
• controlled creation
of replicas

Node 3
Elasticity
• redistribute
partitions
• minimize data
movement
• throttle movement
Example: Distributed Search
State Model Definition: Bootstrap

Idle

setup node

cleanup
recover

Offline

stop consume
data

StateCount=3
stop indexing
and serving

Online

consume data
to build index
can serve requests
Bootstrap

Error
StateCount=5
Example: Distributed Search
Configure and Run
Create Cluster
-­‐-­‐	
  addCluster	
  MyCluster
Add Participants
-­‐-­‐	
  addNode	
  MyCluster	
  localhost_12000
...
Add Resource
-­‐-­‐	
  addResource	
  MyIndex	
  16	
  Bootstrap	
  CUSTOMIZED
Configure Rebalancer
-­‐-­‐	
  rebalance	
  MyIndex	
  8
Example: Message Consumers
Assignment

Scaling

Partitioned Consumer
Queue

Partitioned Consumer
Queue

C1

C1

Fault Tolerance
Partitioned Consumer
Queue
C1

C3
C2

Partition Management
• one consumer per
queue
• even distribution

C3

C2

C2

Elasticity
• redistribute queues
among consumers
• minimize movement

Fault Tolerance
• redistribute
• minimize data
movement
• limit max queue per
consumer
Example: Message Consumers
State Model Definition: Online-Offline
Max 10 queues per consumer
StateCount = 1
Start consumption
Offline

Online
Stop consumption
Example: Message Consumers
Participant Plug-In Code
@StateModelInfo(initialState=“OFFLINE”,	
  states={“OFFLINE”,	
  
“ONLINE”})
class	
  MessageConsumerModel	
  extends	
  StateModel	
  {
	
  	
  @Transition(from=“OFFLINE”,	
  to=“ONLINE”)
	
  	
  public	
  void	
  fromOfflineToOnline(Message	
  m,	
  
NotificationContext	
  ctx)	
  {
	
  	
  	
  	
  //	
  register	
  listener
	
  	
  }
	
  	
  
	
  	
  @Transition(from=“ONLINE”,	
  to=“OFFLINE”)
	
  	
  public	
  void	
  fromOnlineToOffline(Message	
  m,	
  
NotificationContext	
  ctx)	
  {
	
  	
  	
  	
  //	
  unregister	
  listener
	
  	
  }
}
Plugins
Plugins
Overview

Data-Driven
Testing and
Debugging

Chaos Monkey

Rolling
Upgrade

On-Demand
Task
Scheduling

Intra-Cluster
Messaging

Health
Monitoring
Plugins
Data-Driven Testing and Debugging

Instrument ZK,
controller, and
participant logs

Simulate execution
with Chaos Monkey

Analyze
invariants like state
and transition
constraints

The exact sequence of events can be
replayed: debugging made easy!
Plugins
Data-Driven Testing and Debugging: Sample Log File
timestamp

partition

participantName

sessionId

state

1.32331E+12

TestDB_123

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

OFFLINE

1.32331E+12

TestDB_123

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

OFFLINE

1.32331E+12

TestDB_123

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

OFFLINE

1.32331E+12

TestDB_91

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

OFFLINE

1.32331E+12

TestDB_123

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

SLAVE

1.32331E+12

TestDB_91

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

OFFLINE

1.32331E+12

TestDB_123

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

SLAVE

1.32331E+12

TestDB_91

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

OFFLINE

1.32331E+12

TestDB_60

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

OFFLINE

1.32331E+12

TestDB_123

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

SLAVE

1.32331E+12

TestDB_91

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

SLAVE

1.32331E+12

TestDB_60

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

OFFLINE

1.32331E+12

TestDB_123

express1-md_16918

ef172fe9-09ca-4d77b05e-15a414478ccc

SLAVE
Plugins
Data-Driven Testing and Debugging: Count Aggregation
Time

State

Slave Count

Participant

42632

OFFLINE

0

10.117.58.247_12918

42796

SLAVE

1

10.117.58.247_12918

43124

OFFLINE

1

10.202.187.155_12918

43131

OFFLINE

1

10.220.225.153_12918

43275

SLAVE

2

10.220.225.153_12918

43323

SLAVE

3

10.202.187.155_12918

85795

MASTER

2

10.220.225.153_12918

Error! The state constraint for SLAVE
has an upper bound of 2.
Plugins
Data-Driven Testing and Debugging: Time Aggregation
Slave Count

Time

Percentage

0

1082319

0.5

1

35578388

16.46

2

179417802

82.99

3

118863

0.05

Master Count

Time

Percentage

0

1082319

0.5

1

35578388

16.46

83% of the time, there
were 2 slaves to a
partition
93% of the time, there
was 1 master to a
partition

We can see for exactly how long the cluster was out of whack.
Current Status
Helix at LinkedIn
Graph
Index

Standardization

Updates

Search
Index

Primary
DB
Oracle

Data Change Events

Espresso

Databus

Read
Replicas
Coming Up Next

New APIs

Automatic scaling
with YARN

Non-JVM
participants
Summary
• Helix: A generic framework for building
distributed systems
• Abstraction and modularity allow for
modifying and enhancing system behavior
• Simple programming model: declarative
state machine
Questions?

?

website

helix.incubator.apache.org

dev mailing list

dev@helix.incubator.apache.org

user mailing list user@helix.incubator.apache.org
twitter

@apachehelix
Helix talk at RelateIQ

Contenu connexe

Tendances

Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
Jonathan Katz
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

Tendances (20)

Achieving High Availability in PostgreSQL
Achieving High Availability in PostgreSQLAchieving High Availability in PostgreSQL
Achieving High Availability in PostgreSQL
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Untangling Cluster Management with Helix
Untangling Cluster Management with HelixUntangling Cluster Management with Helix
Untangling Cluster Management with Helix
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Presto
PrestoPresto
Presto
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and FanoutOpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
RivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and HistogramsRivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and Histograms
 
Maxscale 소개 1.1.1
Maxscale 소개 1.1.1Maxscale 소개 1.1.1
Maxscale 소개 1.1.1
 
Hiveを高速化するLLAP
Hiveを高速化するLLAPHiveを高速化するLLAP
Hiveを高速化するLLAP
 

Similaire à Helix talk at RelateIQ

REEF: Towards a Big Data Stdlib
REEF: Towards a Big Data StdlibREEF: Towards a Big Data Stdlib
REEF: Towards a Big Data Stdlib
DataWorks Summit
 
Unit 3 principles of programming language
Unit 3 principles of programming languageUnit 3 principles of programming language
Unit 3 principles of programming language
Vasavi College of Engg
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
Universität Rostock
 
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARNOne Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
DataWorks Summit
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
Lucidworks
 

Similaire à Helix talk at RelateIQ (20)

SwarmKit in Theory and Practice
SwarmKit in Theory and PracticeSwarmKit in Theory and Practice
SwarmKit in Theory and Practice
 
Predictable reactive state management - ngrx
Predictable reactive state management - ngrxPredictable reactive state management - ngrx
Predictable reactive state management - ngrx
 
10 implementing subprograms
10 implementing subprograms10 implementing subprograms
10 implementing subprograms
 
REEF: Towards a Big Data Stdlib
REEF: Towards a Big Data StdlibREEF: Towards a Big Data Stdlib
REEF: Towards a Big Data Stdlib
 
20160609 nike techtalks reactive applications tools of the trade
20160609 nike techtalks reactive applications   tools of the trade20160609 nike techtalks reactive applications   tools of the trade
20160609 nike techtalks reactive applications tools of the trade
 
Cdcr apachecon-talk
Cdcr apachecon-talkCdcr apachecon-talk
Cdcr apachecon-talk
 
Service Discovery. Spring Cloud Internals
Service Discovery. Spring Cloud InternalsService Discovery. Spring Cloud Internals
Service Discovery. Spring Cloud Internals
 
A GitOps model for High Availability and Disaster Recovery on EKS
A GitOps model for High Availability and Disaster Recovery on EKSA GitOps model for High Availability and Disaster Recovery on EKS
A GitOps model for High Availability and Disaster Recovery on EKS
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
 
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ... Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 
Unit 3 principles of programming language
Unit 3 principles of programming languageUnit 3 principles of programming language
Unit 3 principles of programming language
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
 
Heart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelHeart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object Model
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARNOne Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
 

Plus de Kishore Gopalakrishna

Plus de Kishore Gopalakrishna (6)

History of Apache Pinot
History of Apache Pinot History of Apache Pinot
History of Apache Pinot
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastore
 
Multi-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & HelixMulti-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & Helix
 
Untangling cluster management with Helix
Untangling cluster management with HelixUntangling cluster management with Helix
Untangling cluster management with Helix
 
Data driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache HelixData driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache Helix
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Helix talk at RelateIQ

  • 1. Apache Helix: Simplifying Distributed Systems Kanak Biscuitwala and Jason Zhang helix.incubator.apache.org @apachehelix
  • 2. Outline • • • • • • • Background Resource Assignment Problem Helix Concepts Putting Concepts to Work Getting Started Plugins Current Status
  • 3. Load balancing Responding to node entry and exit Building distributed systems is hard. Alerting based on metrics Managing data replicas Supporting event listeners
  • 4. Load balancing Responding to node entry and exit Helix abstracts away problems distributed systems need to solve. Alerting based on metrics Managing data replicas Supporting event listeners
  • 5. System Lifecycle Cluster Expansion Fault Tolerance Multi-Node Partitioning Discovery Co-Location Single Node Replication Fault Detection Recovery Throttle movement Redistribute data
  • 8. Resource Assignment Problem Sample Allocation RESOURCES NODES 25% 25% 25% 25%
  • 9. Resource Assignment Problem Failure Handling RESOURCES NODES 33% 33% 34%
  • 10. Resource Assignment Problem Making it Work: Take 1 (ZooKeeper) Application Application Helix ZooKeeper File system Lock Ephemeral ZooKeeper provides low-level primitives Node Partition Replica State Transition Consensus System We need high-level primitives
  • 11. Resource Assignment Problem Making it Work: Take 2 (Decisions by Nodes) S config changes node changes node updates S Consensus System S S service running on a node S
  • 12. Resource Assignment Problem Making it Work: Take 2 (Decisions by Nodes) S config changes node changes node updates S Consensus System multiple brains S app-specific logic unscalable traffic S
  • 13. Resource Assignment Problem Making it Work: Take 2 (Decisions by Nodes) S config changes node changes node updates S Consensus System multiple brains S app-specific logic unscalable traffic S
  • 14. Resource Assignment Problem Making it Work: Take 3 (Single Brain) S node updates node updates Controller config changes node changes Consensus System S Node logic is drastically simplified! S
  • 15. Resource Assignment Problem Helix View RESOURCES Controller Controller Controller Manage NODES (Participants) Spectators
  • 16. Resource Assignment Problem Helix View RESOURCES Controller Controller Controller Manage NODES (Participants) Spectators Question: How do we make this controller generic enough to work for different resources?
  • 19. Helix Concepts Declarative State Model Offline Slave Master
  • 20. Helix Concepts Constraints: Augmenting the State Model State Constraints MASTER: [1, 1] SLAVE: [0, R] Special Constraint Values R: Replica count per partition N: Number of participants
  • 21. Helix Concepts Constraints: Augmenting the State Model State Constraints MASTER: [1, 1] SLAVE: [0, R] Special Constraint Values R: Replica count per partition N: Number of participants Transition Constraints Scope: Cluster OFFLINE-SLAVE: 3 concurrent Scope: Resource R1 SLAVE-MASTER: 1 concurrent Scope: Participant P4 OFFLINE-SLAVE: 2 concurrent
  • 22. Helix Concepts Constraints: Augmenting the State Model State Constraints MASTER: [1, 1] SLAVE: [0, R] Special Constraint Values R: Replica count per partition N: Number of participants Transition Constraints Scope: Cluster OFFLINE-SLAVE: 3 concurrent Scope: Resource R1 SLAVE-MASTER: 1 concurrent Scope: Participant P4 OFFLINE-SLAVE: 2 concurrent States and transitions are ordered by priority in computing replica states. Transition constraints can be restricted to cluster, resource, and participant scopes. The most restrictive constraint is used.
  • 23. Helix Concepts Resources and the Augmented State Model master Resource slave offline Partition Partition Partition All partitions can be replicated. Each replica is in a state governed by the augmented state model.
  • 24. Helix Concepts Objectives Partition Placement Distribution policy for partitions and replicas Making effective use of the cluster and the resource Failure and Expansion Semantics Create new replicas and assign states Changing existing replica states
  • 26. Rebalancing Strategies Meeting Objectives within Constraints Full-Auto Replica Placement Replica State Helix Helix Semi-Auto App Helix Customized User-Defined App App code plugged into the Helix controller App App code plugged into the Helix controller
  • 27. Rebalancing Strategies Full-Auto Node 1 Node 2 Node 3 P1: M P2: M P3: M P2: S P3: S P1: S By default, Helix optimizes for minimal movement and even distribution of partitions and states
  • 28. Rebalancing Strategies Full-Auto Node 1 Node 2 Node 3 P1: M P2: M P3: M P2: S P3: S P1: S By default, Helix optimizes for minimal movement and even distribution of partitions and states
  • 29. Rebalancing Strategies Full-Auto Node 1 Node 2 P1: M P2: M P2: S P3: S P3: M Node 3 P1: S By default, Helix optimizes for minimal movement and even distribution of partitions and states
  • 30. Rebalancing Strategies Semi-Auto Node 1 Node 2 Node 3 P1: M P2: M P3: M P2: S P3: S P1: S Semi-Auto mode maintains the location of the replicas, but allows Helix to adjust the states to follow the state constraints. This is ideal for resources that are expensive to move.
  • 31. Rebalancing Strategies Semi-Auto Node 1 Node 2 Node 3 P1: M P2: M P3: M P2: S P3: S P1: S Semi-Auto mode maintains the location of the replicas, but allows Helix to adjust the states to follow the state constraints. This is ideal for resources that are expensive to move.
  • 32. Rebalancing Strategies Semi-Auto Node 1 Node 2 Node 3 P1: M P2: M P3: M P2: S P3: M P3: S P1: S Semi-Auto mode maintains the location of the replicas, but allows Helix to adjust the states to follow the state constraints. This is ideal for resources that are expensive to move.
  • 33. Rebalancing Strategies Customized The app specifies the location and state of each replica. Helix still ensures that transitions are fired according to constraints.
  • 34. Rebalancing Strategies Customized The app specifies the location and state of each replica. Helix still ensures that transitions are fired according to constraints. Need to respond to node changes? Use the Helix custom code invoker to run on one participant, or...
  • 35. Rebalancing Strategies User-Defined Node joins or leaves the cluster Helix controller invokes code plugged in by the app Rebalancer implemented by app computes replica placement and state Helix fires transitions without violating constraints The rebalancer receives a full snapshot of the current cluster state, as well as access to the backing data store. Helix rebalancers implement the same interface.
  • 36. Rebalancing Strategies User-Defined: Distributed Lock Manager Node 1 Released Node 2 Offline Locked Each lock is a partition!
  • 37. Rebalancing Strategies User-Defined: Distributed Lock Manager Node 1 Node 3 Released Node 2 Offline Locked Each lock is a partition!
  • 38. Rebalancing Strategies User-Defined: Distributed Lock Manager public  ResourceAssignment  computeResourceMapping(        Resource  resource,  IdealState  currentIdealState,        CurrentStateOutput  currentStateOutput,        ClusterDataCache  clusterData)  {    ...    int  i  =  0;    for  (Partition  partition  :  resource.getPartitions())  {        Map<String,  String>  replicaMap  =  new  HashMap<String,  String>();        int  participantIndex  =  i  %  liveParticipants.size();        String  participant  =  liveParticipants.get(participantIndex);        replicaMap.put(participant,  “LOCKED”);        assignment.addReplicaMap(partition,  replicaMap);        i++;    }    return  assignment; }
  • 39. Rebalancing Strategies User-Defined: Distributed Lock Manager public  ResourceAssignment  computeResourceMapping(        Resource  resource,  IdealState  currentIdealState,        CurrentStateOutput  currentStateOutput,        ClusterDataCache  clusterData)  {    ...    int  i  =  0;    for  (Partition  partition  :  resource.getPartitions())  {        Map<String,  String>  replicaMap  =  new  HashMap<String,  String>();        int  participantIndex  =  i  %  liveParticipants.size();        String  participant  =  liveParticipants.get(participantIndex);        replicaMap.put(participant,  “LOCKED”);        assignment.addReplicaMap(partition,  replicaMap);        i++;    }    return  assignment; }
  • 40. Controller Fault Tolerance Offline Standby Leader The augmented state model concept applies to controllers too!
  • 41. Controller Scalability Controller 1 Cluster 1 Cluster 2 Cluster 3 Controller 2 Cluster 4 Cluster 5 Controller 3 Cluster 6
  • 42. Controller Scalability Controller 1 Cluster 1 Cluster 2 Cluster 3 Controller 2 Cluster 4 Cluster 5 Controller 3 Cluster 6
  • 43. ZooKeeper View Ideal State P1 P2 N1: M N2: M N2: S N1: S Replica Placement and State {    "id"  :  "SampleResource",    "simpleFields"  :  {        "REBALANCE_MODE"  :  "USER_DEFINED",        "NUM_PARTITIONS"  :  "2",        "REPLICAS"  :  "2",        "STATE_MODEL_DEF_REF"  :  "MasterSlave",        "STATE_MODEL_FACTORY_NAME"  :  "DEFAULT"    },    "mapFields"  :  {        "SampleResource_0"  :  {            "node1_12918"  :  "MASTER",            "node2_12918"  :  "SLAVE"        }        ...    },    "listFields"  :  {} }
  • 44. ZooKeeper View Current State and External View External View Current State P1 P2 N1 P1: MASTER P2: MASTER N1: M N1: M N2 P1: OFFLINE P2: OFFLINE N2: O N2: O Helix’s responsibility is to make the external view match the ideal state as closely as possible
  • 45. Logical Deployment Spectator Helix Agent ZooKeeper Helix Controller Helix Agent Helix Agent Helix Agent P1: M P2: S Participant P2: M P3: S Participant P3: M P1: S Participant
  • 47. Example: Distributed Data Store Master P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.10 P.11 Slave P.4 P.5 P.6 P.8 P.1 P.2 P.12 P.3 P.4 P.9 P.10 P.11 P.12 P.7 P.8 Node 1 Partition Management • multiple replicas • 1 master • even distribution Node 2 Fault Tolerance • fault detection • promote master to slave • even distribution • no SPOF Node 3 Elasticity • minimize downtime • minimize data movement • throttle movement
  • 48. Example: Distributed Data Store Helix-Based Solution Define state model state transitions Configure create cluster add nodes add resource config rebalancer Run start controller start participants
  • 49. Example: Distributed Data Store State Model Definition: Master-Slave States all possible states priority Transitions legal transitions priority Applicable to each partition of a resource Slave Offline Master
  • 50. Example: Distributed Data Store State Model Definition: Master-Slave builder  =  new   StateModelDefinition.Builder(“MasterSlave”);  //  add  states  and  their  ranks  to  indicate  priority  builder.addState(MASTER,  1);  builder.addState(SLAVE,  2);  builder.addState(OFFLINE);  //  set  the  initial  state  when  participant  starts  builder.initialState(OFFLINE); //  add  transitions  builder.addTransition(OFFLINE,  SLAVE);  builder.addTransition(SLAVE,  OFFLINE);  builder.addTransition(SLAVE,  MASTER);  builder.addTransition(MASTER,  SLAVE);
  • 51. Example: Distributed Data Store Defining Constraints StateCount=2 State Transition Y Y Resource - Y Node Y Y Cluster - Y Partition Slave Offline Master StateCount=1
  • 52. Example: Distributed Data Store Defining Constraints: Code //  static  constraints  builder.upperBound(MASTER,  1);  //  dynamic  constraints  builder.dynamicUpperBound(SLAVE,  “R”);  //  unconstrained  builder.upperBound(OFFLINE,  -­‐1);
  • 53. Example: Distributed Data Store Participant Plug-In Code @StateModelInfo(initialState=“OFFLINE”,  states={“OFFLINE”,   “SLAVE”,  “MASTER”}) class  DistributedDataStoreModel  extends  StateModel  {    @Transition(from=“OFFLINE”,  to=“SLAVE”)    public  void  fromOfflineToSlave(Message  m,  NotificationContext   ctx)  {        //  bootstrap  data,  setup  replication,  etc.    }        @Transition(from=“SLAVE”,  to=“MASTER”)    public  void  fromSlaveToMaster(Message  m,  NotificationContext   ctx)  {        //  catch  up  previous  master,  enable  writes,  etc.    }    ... }
  • 54. Example: Distributed Data Store Configure and Run HelixAdmin  -­‐zkSvr  <zk-­‐address> Create Cluster -­‐-­‐  addCluster  MyCluster Add Participants -­‐-­‐  addNode  MyCluster  localhost_12000 ... Add Resource -­‐-­‐  addResource  MyDB  16  MasterSlave  SEMI_AUTO Configure Rebalancer -­‐-­‐  rebalance  MyDB  3
  • 55. Example: Distributed Data Store Spectator Plug-In Code class  RoutingLogic  {      public  void  write(Request  request)  {            partition  =  getPartition(request.key);            List<Node>  nodes  =   routingTableProvider.getInstance(partition,  “MASTER”);            nodes.get(0).write(request);      }      public  void  read(Request  request)  {            partition  =  getPartition(request.key);            List<Node>  nodes  =   routingTableProvider.getInstance(partition);            random(nodes).read(request);      }
  • 56. Example: Distributed Data Store Where is the Code? Participant Participant Plug-In Code node updates node updates Controller config changes node changes Consensus System Spectator Participant Plug-In Code Participant Spectator Plug-In Code
  • 57. Example: Distributed Search Index shard P.1 P.2 P.3 P.4 P.3 P.4 P.5 P.6 Node 1 Partition Management • multiple replicas • rack-aware placement • even distribution P.5 P.6 P.1 P.2 Node 2 Fault Tolerance • fault detection • auto create replicas • controlled creation of replicas Node 3 Elasticity • redistribute partitions • minimize data movement • throttle movement
  • 58. Example: Distributed Search State Model Definition: Bootstrap Idle setup node cleanup recover Offline stop consume data StateCount=3 stop indexing and serving Online consume data to build index can serve requests Bootstrap Error StateCount=5
  • 59. Example: Distributed Search Configure and Run Create Cluster -­‐-­‐  addCluster  MyCluster Add Participants -­‐-­‐  addNode  MyCluster  localhost_12000 ... Add Resource -­‐-­‐  addResource  MyIndex  16  Bootstrap  CUSTOMIZED Configure Rebalancer -­‐-­‐  rebalance  MyIndex  8
  • 60. Example: Message Consumers Assignment Scaling Partitioned Consumer Queue Partitioned Consumer Queue C1 C1 Fault Tolerance Partitioned Consumer Queue C1 C3 C2 Partition Management • one consumer per queue • even distribution C3 C2 C2 Elasticity • redistribute queues among consumers • minimize movement Fault Tolerance • redistribute • minimize data movement • limit max queue per consumer
  • 61. Example: Message Consumers State Model Definition: Online-Offline Max 10 queues per consumer StateCount = 1 Start consumption Offline Online Stop consumption
  • 62. Example: Message Consumers Participant Plug-In Code @StateModelInfo(initialState=“OFFLINE”,  states={“OFFLINE”,   “ONLINE”}) class  MessageConsumerModel  extends  StateModel  {    @Transition(from=“OFFLINE”,  to=“ONLINE”)    public  void  fromOfflineToOnline(Message  m,   NotificationContext  ctx)  {        //  register  listener    }        @Transition(from=“ONLINE”,  to=“OFFLINE”)    public  void  fromOnlineToOffline(Message  m,   NotificationContext  ctx)  {        //  unregister  listener    } }
  • 65. Plugins Data-Driven Testing and Debugging Instrument ZK, controller, and participant logs Simulate execution with Chaos Monkey Analyze invariants like state and transition constraints The exact sequence of events can be replayed: debugging made easy!
  • 66. Plugins Data-Driven Testing and Debugging: Sample Log File timestamp partition participantName sessionId state 1.32331E+12 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1.32331E+12 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1.32331E+12 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1.32331E+12 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1.32331E+12 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE 1.32331E+12 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1.32331E+12 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE 1.32331E+12 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1.32331E+12 TestDB_60 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1.32331E+12 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE 1.32331E+12 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE 1.32331E+12 TestDB_60 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1.32331E+12 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE
  • 67. Plugins Data-Driven Testing and Debugging: Count Aggregation Time State Slave Count Participant 42632 OFFLINE 0 10.117.58.247_12918 42796 SLAVE 1 10.117.58.247_12918 43124 OFFLINE 1 10.202.187.155_12918 43131 OFFLINE 1 10.220.225.153_12918 43275 SLAVE 2 10.220.225.153_12918 43323 SLAVE 3 10.202.187.155_12918 85795 MASTER 2 10.220.225.153_12918 Error! The state constraint for SLAVE has an upper bound of 2.
  • 68. Plugins Data-Driven Testing and Debugging: Time Aggregation Slave Count Time Percentage 0 1082319 0.5 1 35578388 16.46 2 179417802 82.99 3 118863 0.05 Master Count Time Percentage 0 1082319 0.5 1 35578388 16.46 83% of the time, there were 2 slaves to a partition 93% of the time, there was 1 master to a partition We can see for exactly how long the cluster was out of whack.
  • 71. Coming Up Next New APIs Automatic scaling with YARN Non-JVM participants
  • 72. Summary • Helix: A generic framework for building distributed systems • Abstraction and modularity allow for modifying and enhancing system behavior • Simple programming model: declarative state machine
  • 73. Questions? ? website helix.incubator.apache.org dev mailing list dev@helix.incubator.apache.org user mailing list user@helix.incubator.apache.org twitter @apachehelix