SlideShare une entreprise Scribd logo
1  sur  42
zookeeper 
A Distributed Coordination Service for Distributed Applications 
By Fx_bull
1. Zookeeper Arhitecture and Features 
2. Zookeeper Node roles 
3. Zookeeper configuration (配置) 
4. Zookeeper data model( 介绍znode, zxid等) 
5. Zookeeper data read/write 
6. Key mechanisms, 包括Leader的选举,log和snapshot的 
作用,为什么要奇数个节点,为什么半数以上follower同意才可 
以完成写操作,等
What’s zookeeper 
ZooKeeper is a distributed, open-source coordination service for distributed applications. 
It exposes a simple set of primitives that distributed applications can build upon to 
implement higher level services for synchronization, configuration maintenance, and 
groups and naming. It is designed to be easy to program to, and uses a data model styled 
after the familiar directory tree structure of file systems. 
The motivation behind ZooKeeper is to relieve distributed applications the responsibility of 
implementing coordination services from scratch 
An open source implementation of Chubby
12/11/14 
Zookeeper architecture 
ZooKeeper consists of more servers 
One Leader,more Followers 
High performance: It can be used in large, distributed systems. 
Highly available: The reliability aspects keep it from being a single point of failure 
Strictly ordered access:sophisticated synchronization primitives can be implemented at the client.
The servers that make up the ZooKeeper service must all know about each other. 
Zookeeper uses configuration file to konw each other and PING message type is 
enchanged between follower and leader to determine liveliness 
note: ping is a kind of sending packet to a specified port 
12/11/14
ZooKeeper achieves high-availability through replication, and can provide a 
service as long as a majority of the machines in the ensemble are up. 
For example, in a five-node ensemble, any two machines can fail and the 
service will still work because a majority of three remain. Note that a six-node 
12/11/14 
ensemble can also tolerate only two machines failing, since with three 
failures the remaining three do not constitute a majority of the six. For this 
reason, it is usual to have an odd number of machines in an ensemble. 
others reason: can not form the majority, any value can not be approved.
feature 
1、It is especially fast in "read-dominant" workloads. 
2、 ZooKeeper is replicated. Like the distributed processes it 
coordinates, ZooKeeper itself is intended to be replicated over a sets of 
hosts called an ensemble. 
3、Every update made to the znode tree is given a globally unique 
identifier, called a zxid (which stands for “ZooKeeper transaction ID”). 
………… 
12/11/14
Zookeeper Data Model 
A shared hierarchal namespace, similarly to a standard file system 
Each folder called znode 
ZooKeeper was designed to store coordination data, so it is very small 
status information (version numbers for data changes, ACL changes, and 
timestamps) ,configuration,location information. 
12/11/14
zonde data structure 
czxid:The zxid of the change that caused this znode to be created. 
mzxid:The zxid of the change that last modified this znode. 
ctime:The time in milliseconds from epoch when this znode was created. 
mtime:The time in milliseconds from epoch when this znode was last modified. 
version:The number of changes to the data of this znode. 
cversion:The number of changes to the children of this znode. 
aversion:The number of changes to the ACL of this znode. 
ephemeralOwner:The session id of the owner of this znode if the znode is an 
ephemeral node. If it is not an ephemeral node, it will be zero. 
dataLength:The length of the data field of this znode.The maximum allowable 
size of the data array is 1 MB 
numChildren:The number of children of this znode. pzxid ?? 
12/11/14
ZooKeeper is replicated. 
Theory a client will see the same view of the system regardless of the server it connects 
to 
12/11/14 
Like the distributed processes it coordinates, ZooKeeper itself is intended to be 
replicated over a sets of hosts called an ensemble. 
All of the server have the same data guaranteed by fast paxos algorithm
12/11/14 
role of zookeeper 
• Leader : responsible for initiation and resolution of 
the final vote, update the status in the end . 
note:It is possible to configure ZooKeeper so that 
the leader does not accept client connections. set 
zookeeper.leaderServes value to "no" 
• Follower :Follower for receiving client requests 
and returned to the client results. Participate in the 
Leader-sponsored vote. the server will synchronize 
with the leader and replicate any transactions.
• Oberserver :The observer can enhance the performance of the read 
operation of the cluster that it does not affect the write performance, it only 
accepts read requests, write requests are forwarded to the leader. 
The problem is that as we add more voting members, the write 
performance drops. This is due to the fact that a write operation requires 
the agreement of (in general) at least half the nodes in an ensemble and 
therefore the cost of a vote can increase significantly as more voters are 
added. 
peerType=observer 
server.1:localhost:2181:3181:observer 
detail: http://zookeeper.apache.org/doc/trunk/zookeeperObservers.html
Read data from the connected Server 
Read requests are serviced from the local replica of each 
server database
Write data:Paxos 
•N Senator make decision in Paxos Island 
•Each proposal has a increasing PID 
•More than half Senators pass the proposal , 
it can pass 
•Each Senator just agree the proposal whose 
PID is bigger than the current PID 
12/11/14 
ZooKeeper 
Senator -> Server 
proposal ->ZnodeChange 
PID -> ZooKeeper Transaction Id
paxos 
http://en.wikipedia.org/wiki/Paxos_algorithm 
http://zh.wikipedia.org/zh-cn/Paxos%E7%AE 
%97%E6%B3%95 
http://research.microsoft.com/pubs/64624/tr-2005-112.pdf 
http://rdc.taobao.com/blog/cs/?p=162
Write data: Cilent  zookeeper 
write requests, are processed by an agreement protocol 
a leader proposes a request, collects votes, and finally commits 
1.Client sent write request to Server 
2. Server sent write request to leader 
3. Leader sent PROPOSAL message to all the followers.(asynchronous sent) 
4. Followers: Agree or deny (ACK sent by a follower after it has synced a proposal) 
5. Commit 
6. Sent response to client
note: 
All machines in the ensemble write updates to disk before updating their 
in-memory copy of the znode tree. 
Updates are logged to disk for recoverability, and writes are serialized to 
disk before they are applied to the in-memory database. 
http://zookeeper.apache.org/doc/r3.2.2/zookeeperOver.html 
SyncRequestProcessor 
ZkDatabase 
if restart: ZkDatabase-> load the database from the disk onto memory 
when boot up 
This class maintains the in memory database of zookeeper server states 
that includes the sessions, datatree and the committed logs. 
It is booted up after reading the logs and snapshots from the disk. 
12/11/14
log and snapshot: 
SyncRequestProcessor 
when take a snapshot 
1、when leader change 
2、a new server comes 
3、when 
logCount>(snapCount/2+randRoll) 
snapshot is used for recoverability with 
logs 
detail : http://rdc.taobao.com/team/jm/archives/947
question 
1、When the leader crash? 
2、they make the point that a follower may lag the leader , 
so one cilent may read outdate data. 
3、why update half of the nodes
Leader Selection 
SSeerrvveerr 
Send 
data 
Selected Leader id 
zxid 
Logic clock(init value equals 0) 
Status: LOOKING, FOLLOWING, 
OBSERVING,LEADING 
Server1 
Server2 
Server3 
Server4 
Server5 
Step 1 
Step 2 
Step 3 
Step 4 
Step 5 
No response  looking 
Server2 is leader, but less than half Servers agree 
looking 
Server3 is leader, more than half Servers agree 
Leading 
There is leader already following 
There is leader already following
12/11/14 
Leader selection 
note: dataVersion->zxid 
http://zookeeper.apache.org/doc/r3.2.2/zookeeperInternals.html#sc_leaderElection 
http://rdc.taobao.com/blog/cs/?p=162
This phase is finished once a majority (or quorum) of followers have 
synchronized their state with the leader.
The sync operation forces the ZooKeeper server to which a 
cilent is connected to “catch up” with the leader, 
12/11/14 
question 2 
1、use sync 
2、watcher 
Application scenarios limit
Watcher :data zookeeper Cilent 
1、ZooKeeper supports the concept of watches. 
2.Clients can set a watch on a znodes. 
3.A watch will be triggered and removed when the znode changes. 
4.When a watch is triggered, the client receives a packet saying that the znode has 
changed. 
12/11/14
12/11/14 
question 3 :why half of the nodes 
• performance
12/11/14 
Zookeeper Performace 
It is especially high performance in applications where reads outnumber 
writes, since writes involve synchronizing the state of all servers. (Reads 
outnumbering writes is typically the case for a coordination service.)
12/11/14 
use of zookeeper 
1、Master election 
/currentMaster/{sessionId}-1 , 
/currentMaster/{sessionId}-2 , 
/currentMaster/{sessionId}-3 
EPHEMERAL_SEQUENTIAL node
2、Hbase use zookeeper 
Select a master. 
Discover which master controls which servers. 
Help the client to find its master.
Configuration Management (push) 
1. Every server corresponds to a znodein ZooKeeper. (Client1 P1, C2 P2, 
12/11/14 
…) 
2. Multiple servers in one cluster may share one configuration. 
3. When the configuration changed, they should receive notification
Cluster Management 
1. When one machine is dead, other machine should receive the notification. 
2. When one server dies, his znode will be automatically removed. (C1 P1, C2 P2 …) 
3. When the master machine is dead, how to select the new master? Paxos! 
12/11/14
Other: Queues 、Double Barriers 、Two-phased Commit. etc 
reference: 
• http://zookeeper.apache.org/doc/r3.3.2/recip 
es.html 
• http://rdc.taobao.com/team/jm/archives/1232 
这里总结很详细,就不粘贴了 
12/11/14
12/11/14 
Configuration 
Each server in the ensemble of ZooKeeper servers has a numeric 
identifier that is unique within the ensemble, and must fall between 1 and 
255. 
we can see that the number of zookeeper server is less than 255; 
A ZooKeeper service usually consists of three to seven machines. Our 
implementation supports more machines, but three to seven machines 
provide more than enough performance and resilience. 
So if you want reliability go with at least 3. We typically recommend 
having 5 servers in "online" production serving environments. This 
allows you to take 1 server out of service (say planned 
maintenance) and still be able to sustain an unexpected outage of 
one of the remaining servers w/o interruption of the service.
12/11/14 
zoo.cfg 
tickTime=2000 
dataDir=/disk1/zookeeper 
dataLogDir=/disk2/zookeeper 
clientPort=2181 
initLimit=5 
syncLimit=2 
server.1=zookeeper1:2888:3888 
server.2=zookeeper2:2888:3888 
server.3=zookeeper3:2888:3888
initLimit is the amount of time to allow for followers to connect to 
and sync with the leader. If a majority of followers fail to sync 
within this period, then the leader renounces its leadership status 
and another leader election takes place. If this happens often (and 
you can discover if this is the case because it is logged), it is a sign 
that the setting is too low. (10s) 
syncLimit is the amount of time to allow a follower to sync with 
the leader. If a follower fails to sync within this period, it will 
restart itself. Clients that were attached to this follower will connect 
to another one.(4s) 
12/11/14
12/11/14 
Servers listen on three ports: 
2181 for client connections; 
2888 for follower connections,if they are the leader; 
3888 for other server connections during the leader 
election phase.
12/11/14 
FAQ 
How do I size a ZooKeeper ensemble (cluster)? 
In general when determining the number of ZooKeeper serving nodes to deploy (the size 
of an ensemble) you need to think in terms of reliability, and not performance. 
Reliability: 
A single ZooKeeper server (standalone) is essentially a coordinator with no reliability (a 
single serving node failure brings down the ZK service). 
A 3 server ensemble (you need to jump to 3 and not 2 because ZK works based on 
simple majority voting) allows for a single server to fail and the service will still be 
available. 
So if you want reliability go with at least 3. We typically recommend having 5 servers in 
"online" production serving environments. This allows you to take 1 server out of service 
(say planned maintenance) and still be able to sustain an unexpected outage of one of the 
remaining servers w/o interruption of the service. 
Performance: 
Write performance actually decreases as you add ZK servers, while read performance 
increases modestly: http://bit.ly/9JEUju
faq 
• http://rdc.taobao.com/team/jm/archives/138 
4
1、leader 选举完之后,或者新加入 zookeeper server , 
follower都要和leader进行同步, 我看配置文件经常设置成 
4s , 其实内存中的镜像有时候可能很大, 4s之内完不成同步, 
怎么办 
2、“粗粒度”的锁服务,说下 “粗粒度”该怎么理解啊? 
3、权威指南强调大部分持久化成功之后才,返回?? 
4、并不是每次都持久化??
QuorumPeerMain ZookeeperServerMain is used to start the program
Processor Chain 
LeaderZooKeeperServer 
FollowerZooKeeperServer
summary 
•Hadoop Zookeeper: 
An open source implementation of Chubby. 
•Data Model: 
A shared hierarchal namespace, similarly to a standard file system 
•One Leader,more Followers Architecture 
Follower has the same Data Model. 
Use Paxos algorithm to implement consistency 
•Watcher 
Client can monitor the znode change by watcher

Contenu connexe

Tendances

Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper Omid Vahdaty
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
Présentation de Apache Zookeeper
Présentation de Apache ZookeeperPrésentation de Apache Zookeeper
Présentation de Apache ZookeeperMichaël Morello
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaYoungHeon (Roy) Kim
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafkaconfluent
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
 
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)Aurimas Mikalauskas
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformVMware Tanzu
 

Tendances (20)

Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
kafka
kafkakafka
kafka
 
Présentation de Apache Zookeeper
Présentation de Apache ZookeeperPrésentation de Apache Zookeeper
Présentation de Apache Zookeeper
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 

En vedette

Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayAndrei Savu
 
Zookeeper
ZookeeperZookeeper
Zookeeperltsllc
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Jimmy Lai
 
Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!Joydeep Banik Roy
 
ZooKeeper - wait free protocol for coordinating processes
ZooKeeper - wait free protocol for coordinating processesZooKeeper - wait free protocol for coordinating processes
ZooKeeper - wait free protocol for coordinating processesJulia Proskurnia
 
Dynamic Reconfiguration of Apache ZooKeeper
Dynamic Reconfiguration of Apache ZooKeeperDynamic Reconfiguration of Apache ZooKeeper
Dynamic Reconfiguration of Apache ZooKeeperDataWorks Summit
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperknowbigdata
 
Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012mumrah
 
Distributed Applications with Apache Zookeeper
Distributed Applications with Apache ZookeeperDistributed Applications with Apache Zookeeper
Distributed Applications with Apache ZookeeperAlex Ehrnschwender
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache ZookeeperAnshul Patel
 
Taming Pythons with ZooKeeper
Taming Pythons with ZooKeeperTaming Pythons with ZooKeeper
Taming Pythons with ZooKeeperJyrki Pulliainen
 
Taming Pythons with ZooKeeper (Pyconfi edition)
Taming Pythons with ZooKeeper (Pyconfi edition)Taming Pythons with ZooKeeper (Pyconfi edition)
Taming Pythons with ZooKeeper (Pyconfi edition)Jyrki Pulliainen
 
Zookeeper In Action
Zookeeper In ActionZookeeper In Action
Zookeeper In Actionjuvenxu
 
ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0
ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0
ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0lisanl
 
Overview of Zookeeper, Helix and Kafka (Oakjug)
Overview of Zookeeper, Helix and Kafka (Oakjug)Overview of Zookeeper, Helix and Kafka (Oakjug)
Overview of Zookeeper, Helix and Kafka (Oakjug)Chris Richardson
 
Jcconf 2016 zookeeper
Jcconf 2016 zookeeperJcconf 2016 zookeeper
Jcconf 2016 zookeeperMatt Ho
 

En vedette (20)

Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesday
 
Zookeeper
ZookeeperZookeeper
Zookeeper
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!
 
ZooKeeper - wait free protocol for coordinating processes
ZooKeeper - wait free protocol for coordinating processesZooKeeper - wait free protocol for coordinating processes
ZooKeeper - wait free protocol for coordinating processes
 
Dynamic Reconfiguration of Apache ZooKeeper
Dynamic Reconfiguration of Apache ZooKeeperDynamic Reconfiguration of Apache ZooKeeper
Dynamic Reconfiguration of Apache ZooKeeper
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012
 
Distributed Applications with Apache Zookeeper
Distributed Applications with Apache ZookeeperDistributed Applications with Apache Zookeeper
Distributed Applications with Apache Zookeeper
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache Zookeeper
 
Groovy to gradle
Groovy to gradleGroovy to gradle
Groovy to gradle
 
ZooKeeper (and other things)
ZooKeeper (and other things)ZooKeeper (and other things)
ZooKeeper (and other things)
 
ZooKeeper Futures
ZooKeeper FuturesZooKeeper Futures
ZooKeeper Futures
 
Taming Pythons with ZooKeeper
Taming Pythons with ZooKeeperTaming Pythons with ZooKeeper
Taming Pythons with ZooKeeper
 
Taming Pythons with ZooKeeper (Pyconfi edition)
Taming Pythons with ZooKeeper (Pyconfi edition)Taming Pythons with ZooKeeper (Pyconfi edition)
Taming Pythons with ZooKeeper (Pyconfi edition)
 
Zookeeper In Action
Zookeeper In ActionZookeeper In Action
Zookeeper In Action
 
ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0
ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0
ZooKeeper and Embedded ZooKeeper Support for IBM InfoSphere Streams V4.0
 
Overview of Zookeeper, Helix and Kafka (Oakjug)
Overview of Zookeeper, Helix and Kafka (Oakjug)Overview of Zookeeper, Helix and Kafka (Oakjug)
Overview of Zookeeper, Helix and Kafka (Oakjug)
 
Jcconf 2016 zookeeper
Jcconf 2016 zookeeperJcconf 2016 zookeeper
Jcconf 2016 zookeeper
 
Zookeeper
ZookeeperZookeeper
Zookeeper
 

Similaire à Zookeeper Introduce

MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and ShardingTharun Srinivasa
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLéopold Gault
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategySaptarshi Chatterjee
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBANikhil Kumar
 
Zookeeper big sonata
Zookeeper  big sonataZookeeper  big sonata
Zookeeper big sonataAnh Le
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Technical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauTechnical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauMapR Technologies
 
Scale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with AkkaScale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with AkkaYuval Itzchakov
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009lilyco
 
sector-sphere
sector-spheresector-sphere
sector-spherexlight
 
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...DataWorks Summit
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
Percona XtraDB 集群文档
Percona XtraDB 集群文档Percona XtraDB 集群文档
Percona XtraDB 集群文档YUCHENG HU
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Shardinguzzal basak
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesMydbops
 

Similaire à Zookeeper Introduce (20)

MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
 
Google file system
Google file systemGoogle file system
Google file system
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache Kafka
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBA
 
Zookeeper big sonata
Zookeeper  big sonataZookeeper  big sonata
Zookeeper big sonata
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Technical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauTechnical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques Nadeau
 
MYSQL
MYSQLMYSQL
MYSQL
 
Scale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with AkkaScale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with Akka
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Percona XtraDB 集群文档
Percona XtraDB 集群文档Percona XtraDB 集群文档
Percona XtraDB 集群文档
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Sharding
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best Practices
 

Dernier

%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 

Dernier (20)

Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Zookeeper Introduce

  • 1. zookeeper A Distributed Coordination Service for Distributed Applications By Fx_bull
  • 2. 1. Zookeeper Arhitecture and Features 2. Zookeeper Node roles 3. Zookeeper configuration (配置) 4. Zookeeper data model( 介绍znode, zxid等) 5. Zookeeper data read/write 6. Key mechanisms, 包括Leader的选举,log和snapshot的 作用,为什么要奇数个节点,为什么半数以上follower同意才可 以完成写操作,等
  • 3. What’s zookeeper ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. The motivation behind ZooKeeper is to relieve distributed applications the responsibility of implementing coordination services from scratch An open source implementation of Chubby
  • 4. 12/11/14 Zookeeper architecture ZooKeeper consists of more servers One Leader,more Followers High performance: It can be used in large, distributed systems. Highly available: The reliability aspects keep it from being a single point of failure Strictly ordered access:sophisticated synchronization primitives can be implemented at the client.
  • 5. The servers that make up the ZooKeeper service must all know about each other. Zookeeper uses configuration file to konw each other and PING message type is enchanged between follower and leader to determine liveliness note: ping is a kind of sending packet to a specified port 12/11/14
  • 6. ZooKeeper achieves high-availability through replication, and can provide a service as long as a majority of the machines in the ensemble are up. For example, in a five-node ensemble, any two machines can fail and the service will still work because a majority of three remain. Note that a six-node 12/11/14 ensemble can also tolerate only two machines failing, since with three failures the remaining three do not constitute a majority of the six. For this reason, it is usual to have an odd number of machines in an ensemble. others reason: can not form the majority, any value can not be approved.
  • 7. feature 1、It is especially fast in "read-dominant" workloads. 2、 ZooKeeper is replicated. Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a sets of hosts called an ensemble. 3、Every update made to the znode tree is given a globally unique identifier, called a zxid (which stands for “ZooKeeper transaction ID”). ………… 12/11/14
  • 8. Zookeeper Data Model A shared hierarchal namespace, similarly to a standard file system Each folder called znode ZooKeeper was designed to store coordination data, so it is very small status information (version numbers for data changes, ACL changes, and timestamps) ,configuration,location information. 12/11/14
  • 9.
  • 10. zonde data structure czxid:The zxid of the change that caused this znode to be created. mzxid:The zxid of the change that last modified this znode. ctime:The time in milliseconds from epoch when this znode was created. mtime:The time in milliseconds from epoch when this znode was last modified. version:The number of changes to the data of this znode. cversion:The number of changes to the children of this znode. aversion:The number of changes to the ACL of this znode. ephemeralOwner:The session id of the owner of this znode if the znode is an ephemeral node. If it is not an ephemeral node, it will be zero. dataLength:The length of the data field of this znode.The maximum allowable size of the data array is 1 MB numChildren:The number of children of this znode. pzxid ?? 12/11/14
  • 11. ZooKeeper is replicated. Theory a client will see the same view of the system regardless of the server it connects to 12/11/14 Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a sets of hosts called an ensemble. All of the server have the same data guaranteed by fast paxos algorithm
  • 12. 12/11/14 role of zookeeper • Leader : responsible for initiation and resolution of the final vote, update the status in the end . note:It is possible to configure ZooKeeper so that the leader does not accept client connections. set zookeeper.leaderServes value to "no" • Follower :Follower for receiving client requests and returned to the client results. Participate in the Leader-sponsored vote. the server will synchronize with the leader and replicate any transactions.
  • 13. • Oberserver :The observer can enhance the performance of the read operation of the cluster that it does not affect the write performance, it only accepts read requests, write requests are forwarded to the leader. The problem is that as we add more voting members, the write performance drops. This is due to the fact that a write operation requires the agreement of (in general) at least half the nodes in an ensemble and therefore the cost of a vote can increase significantly as more voters are added. peerType=observer server.1:localhost:2181:3181:observer detail: http://zookeeper.apache.org/doc/trunk/zookeeperObservers.html
  • 14. Read data from the connected Server Read requests are serviced from the local replica of each server database
  • 15. Write data:Paxos •N Senator make decision in Paxos Island •Each proposal has a increasing PID •More than half Senators pass the proposal , it can pass •Each Senator just agree the proposal whose PID is bigger than the current PID 12/11/14 ZooKeeper Senator -> Server proposal ->ZnodeChange PID -> ZooKeeper Transaction Id
  • 16. paxos http://en.wikipedia.org/wiki/Paxos_algorithm http://zh.wikipedia.org/zh-cn/Paxos%E7%AE %97%E6%B3%95 http://research.microsoft.com/pubs/64624/tr-2005-112.pdf http://rdc.taobao.com/blog/cs/?p=162
  • 17. Write data: Cilent  zookeeper write requests, are processed by an agreement protocol a leader proposes a request, collects votes, and finally commits 1.Client sent write request to Server 2. Server sent write request to leader 3. Leader sent PROPOSAL message to all the followers.(asynchronous sent) 4. Followers: Agree or deny (ACK sent by a follower after it has synced a proposal) 5. Commit 6. Sent response to client
  • 18. note: All machines in the ensemble write updates to disk before updating their in-memory copy of the znode tree. Updates are logged to disk for recoverability, and writes are serialized to disk before they are applied to the in-memory database. http://zookeeper.apache.org/doc/r3.2.2/zookeeperOver.html SyncRequestProcessor ZkDatabase if restart: ZkDatabase-> load the database from the disk onto memory when boot up This class maintains the in memory database of zookeeper server states that includes the sessions, datatree and the committed logs. It is booted up after reading the logs and snapshots from the disk. 12/11/14
  • 19. log and snapshot: SyncRequestProcessor when take a snapshot 1、when leader change 2、a new server comes 3、when logCount>(snapCount/2+randRoll) snapshot is used for recoverability with logs detail : http://rdc.taobao.com/team/jm/archives/947
  • 20. question 1、When the leader crash? 2、they make the point that a follower may lag the leader , so one cilent may read outdate data. 3、why update half of the nodes
  • 21. Leader Selection SSeerrvveerr Send data Selected Leader id zxid Logic clock(init value equals 0) Status: LOOKING, FOLLOWING, OBSERVING,LEADING Server1 Server2 Server3 Server4 Server5 Step 1 Step 2 Step 3 Step 4 Step 5 No response  looking Server2 is leader, but less than half Servers agree looking Server3 is leader, more than half Servers agree Leading There is leader already following There is leader already following
  • 22. 12/11/14 Leader selection note: dataVersion->zxid http://zookeeper.apache.org/doc/r3.2.2/zookeeperInternals.html#sc_leaderElection http://rdc.taobao.com/blog/cs/?p=162
  • 23. This phase is finished once a majority (or quorum) of followers have synchronized their state with the leader.
  • 24. The sync operation forces the ZooKeeper server to which a cilent is connected to “catch up” with the leader, 12/11/14 question 2 1、use sync 2、watcher Application scenarios limit
  • 25. Watcher :data zookeeper Cilent 1、ZooKeeper supports the concept of watches. 2.Clients can set a watch on a znodes. 3.A watch will be triggered and removed when the znode changes. 4.When a watch is triggered, the client receives a packet saying that the znode has changed. 12/11/14
  • 26. 12/11/14 question 3 :why half of the nodes • performance
  • 27. 12/11/14 Zookeeper Performace It is especially high performance in applications where reads outnumber writes, since writes involve synchronizing the state of all servers. (Reads outnumbering writes is typically the case for a coordination service.)
  • 28. 12/11/14 use of zookeeper 1、Master election /currentMaster/{sessionId}-1 , /currentMaster/{sessionId}-2 , /currentMaster/{sessionId}-3 EPHEMERAL_SEQUENTIAL node
  • 29. 2、Hbase use zookeeper Select a master. Discover which master controls which servers. Help the client to find its master.
  • 30. Configuration Management (push) 1. Every server corresponds to a znodein ZooKeeper. (Client1 P1, C2 P2, 12/11/14 …) 2. Multiple servers in one cluster may share one configuration. 3. When the configuration changed, they should receive notification
  • 31. Cluster Management 1. When one machine is dead, other machine should receive the notification. 2. When one server dies, his znode will be automatically removed. (C1 P1, C2 P2 …) 3. When the master machine is dead, how to select the new master? Paxos! 12/11/14
  • 32. Other: Queues 、Double Barriers 、Two-phased Commit. etc reference: • http://zookeeper.apache.org/doc/r3.3.2/recip es.html • http://rdc.taobao.com/team/jm/archives/1232 这里总结很详细,就不粘贴了 12/11/14
  • 33. 12/11/14 Configuration Each server in the ensemble of ZooKeeper servers has a numeric identifier that is unique within the ensemble, and must fall between 1 and 255. we can see that the number of zookeeper server is less than 255; A ZooKeeper service usually consists of three to seven machines. Our implementation supports more machines, but three to seven machines provide more than enough performance and resilience. So if you want reliability go with at least 3. We typically recommend having 5 servers in "online" production serving environments. This allows you to take 1 server out of service (say planned maintenance) and still be able to sustain an unexpected outage of one of the remaining servers w/o interruption of the service.
  • 34. 12/11/14 zoo.cfg tickTime=2000 dataDir=/disk1/zookeeper dataLogDir=/disk2/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
  • 35. initLimit is the amount of time to allow for followers to connect to and sync with the leader. If a majority of followers fail to sync within this period, then the leader renounces its leadership status and another leader election takes place. If this happens often (and you can discover if this is the case because it is logged), it is a sign that the setting is too low. (10s) syncLimit is the amount of time to allow a follower to sync with the leader. If a follower fails to sync within this period, it will restart itself. Clients that were attached to this follower will connect to another one.(4s) 12/11/14
  • 36. 12/11/14 Servers listen on three ports: 2181 for client connections; 2888 for follower connections,if they are the leader; 3888 for other server connections during the leader election phase.
  • 37. 12/11/14 FAQ How do I size a ZooKeeper ensemble (cluster)? In general when determining the number of ZooKeeper serving nodes to deploy (the size of an ensemble) you need to think in terms of reliability, and not performance. Reliability: A single ZooKeeper server (standalone) is essentially a coordinator with no reliability (a single serving node failure brings down the ZK service). A 3 server ensemble (you need to jump to 3 and not 2 because ZK works based on simple majority voting) allows for a single server to fail and the service will still be available. So if you want reliability go with at least 3. We typically recommend having 5 servers in "online" production serving environments. This allows you to take 1 server out of service (say planned maintenance) and still be able to sustain an unexpected outage of one of the remaining servers w/o interruption of the service. Performance: Write performance actually decreases as you add ZK servers, while read performance increases modestly: http://bit.ly/9JEUju
  • 39. 1、leader 选举完之后,或者新加入 zookeeper server , follower都要和leader进行同步, 我看配置文件经常设置成 4s , 其实内存中的镜像有时候可能很大, 4s之内完不成同步, 怎么办 2、“粗粒度”的锁服务,说下 “粗粒度”该怎么理解啊? 3、权威指南强调大部分持久化成功之后才,返回?? 4、并不是每次都持久化??
  • 40. QuorumPeerMain ZookeeperServerMain is used to start the program
  • 41. Processor Chain LeaderZooKeeperServer FollowerZooKeeperServer
  • 42. summary •Hadoop Zookeeper: An open source implementation of Chubby. •Data Model: A shared hierarchal namespace, similarly to a standard file system •One Leader,more Followers Architecture Follower has the same Data Model. Use Paxos algorithm to implement consistency •Watcher Client can monitor the znode change by watcher

Notes de l'éditeur

  1. 通过读配置文件 zookeeper中Leader怎么知道Fllower还存活,Fllower怎么知道Leader还存活 Leader定时向Fllower发ping消息,Fllower定时向Leader发ping消息,当发现Leader无法ping通时,就改变自己的状态(LOOKING),发起新的一轮选举
  2. Leader定时向Fllower发ping消息,Fllower定时向Leader发ping消息,当发现Leader无法ping通时,就改变自己的状态(LOOKING),发起新的一轮选举
  3. 如果恰好一半 acceptor 批准 value A,另一半批准 value B,那么就无法形成多数派,无法批准任何一个值。
  4. 所有host节点上的数据一致,通过fast paxos算法保证
  5. 这里假设没有拜占庭将军问题(Byzantine failure,即虽然有可能一个消息被传递了两次,但是绝对不会出现错误的消息);只要等待足够的时间,消息就会被传到。另外,Paxos 岛上的议员是不会反对其他议员提出的决议的。 proposer 提出一个提案前,首先要和足以形成多数派的 acceptors 进行通信,获得他们进行的最近一次批准活动的编号(prepare 过程),之后根据回收的信息决定这次提案的 value,形成提案开始投票。当获得多数 acceptors 批准后,提案获得通过,由 proposer 将这个消息告知 learner。 通过一个决议分为两个阶段: prepare 阶段: proposer 选择一个提案编号 n 并将 prepare 请求发送给 acceptors 中的一个多数派; acceptor 收到 prepare 消息后,如果提案的编号大于它已经回复的所有 prepare 消息,则 acceptor 将自己上次的批准回复给 proposer,并承诺不再回复小于 n 的提案; 批准阶段: 当一个 proposor 收到了多数 acceptors 对 prepare 的回复后,就进入批准阶段。它要向回复 prepare 请求的 acceptors 发送 accept 请求,包括编号 n 和根据 P2c 决定的 value(如果根据 P2c 没有决定 value,那么它可以自由决定 value)。 在不违背自己向其他 proposer 的承诺的前提下,acceptor 收到 accept 请求后即批准这个请求。
  6. We are able to simplify the two-phase commit protocol because we do not have aborts; followers either acknowledge the leader's proposal or they abandon the leader. The lack of aborts also mean that we can commit once a quorum of servers ack the proposal rather than waiting for all servers to respond. This simplied two- phase commit by itself cannot handle leader failures, so we will add recovery mode to handle leader failures.
  7. If a Zab server comes online while a leader is actively broadcasting messages, the server will start in recovery mode, discover and synchronize with the leader, and start participating in the message broadcasts.---zab For example, a Zab service made up of three servers where one is a leader and the two other servers are followers will move to broadcast mode. If one of the followers die, there will be no interruption in service since the leader will still have a quorum. If the follower recovers and the other dies, there will still be no service interruption.
  8. 输出快照的log数目阀值snapCount可以通过zookeeper.snapCount系统属性设置,默认是100000条 当请求比较少时(包括读请求),每个更新会很快刷出,即使没有写够1000条。当请求压力很大时,才会一直等堆积到1000条才刷出log文件,同时送出消息到下一个环节。
  9. 当一个Server启动时它都会发起一次选举,跟顺序有关系,选举完之后会和leader进行数据同步;一旦半数以上完成数据同步则此阶段结束 选举算法:http://zookeeper.apache.org/doc/r3.2.2/zookeeperInternals.html#sc_leaderElection http://rdc.taobao.com/blog/cs/?p=162 源码:http://blog.sina.com.cn/s/blog_3fe961ae01012dkk.html Zookeeper用于选举leader有三个类:FastLeaderElection类,LeaderElection类,AuthFastLeaderElection类。默认使用FastLeaderElection类。FastLeaderElection选举算法用到的类是FastLeaderElection类,在org.apache.zookeeper.server.quorum包中 If a majority of followers fail to sync within this period, then the leader renounces its leadership status and another leader election takes place. If this happens often (and you can discover if this is the case because it is logged), it is a sign that the setting is too low. Zookeeper的核心是原子广播,这个机制保证了各个server之间的同步。实现这个机制的协议叫做Zab协议。Zab协议有两种模式,它们分别是恢复模式和广播模式。当服务启动或者在领导者崩溃后,Zab就进入了恢复模式,当领导者被选举出来,且大多数server的完成了和leader的状态同步以后,恢复模式就结束了。状态同步保证了leader和server具有相同的系统状态。 一旦leader已经和多数的follower进行了状态同步后,他就可以开始广播消息了,即进入广播状态。这时候当一个server加入zookeeper服务中,它会在恢复模式下启动,发现leader,并和leader进行状态同步。待到同步结束,它也参与消息广播。Zookeeper服务一直维持在Broadcast状态,直到leader崩溃了或者leader失去了大部分的followers支持。 Broadcast模式极其类似于分布式事务中的2pc(two-phrase commit 两阶段提交):即leader提起一个决议,由followers进行投票,leader对投票结果进行计算决定是否通过该决议,如果通过执行该决议(事务),否则什么也不做。
  10. http://rdc.taobao.com/blog/cs/?p=162
  11. ZooKeeper不能确保任何客户端能够获取(即Read Request)到一样的数据,除非客户端自己要求:方法是客户端在获取数据之前调用org.apache.zookeeper.AsyncCallback.VoidCallback, java.lang.Object) sync. 通常情况下(这里所说的通常情况满足:1. 对获取的数据是否是最新版本不敏感,2. 一个客户端修改了数据,其它客户端需要不需要立即能够获取最新),可以不关心这点。 在其它情况下,最清晰的场景是这样:ZK客户端A对 /my_test 的内容从 v1->v2, 但是ZK客户端B对 /my_test 的内容获取,依然得到的是 v1. 请注意,这个是实际存在的现象,当然延时很短。解决的方法是客户端B先调用 sync(), 再调用 getData().
  12. 考虑: 1、一半以上的话便已经能够提供服务 2、确保客户端连到的服务器都是数据最新的 3、某些服务器数据可能滞后于领导者几个更新 4、假如某台服务器网络不行,迟迟更新不完?性能太差了
  13. http://zookeeper.apache.org/doc/r3.3.2/recipes.html lock:http://agapple.iteye.com/blog/1184040 典型应用场景:http://rdc.taobao.com/team/jm/archives/1232
  14. http://rdc.taobao.com/team/jm/archives/448/comment-page-1#comment-4309