Learning Objectives - This module will cover Advance HBase concepts. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster, why HBase uses Zookeeper and how to Build Applications with Zookeeper.
5. Example: Mail Inbox
<userId> : <colfam> : <messageId> : <timestamp> : <email-message>
12345 : data : 5fc38314-e290-ae5da5fc375d : 1307097848 : "Hi Lars, ..."
12345 : data : 725aae5f-d72e-f90f3f070419 : 1307099848 : "Welcome, and ..."
12345 : data : cc6775b3-f249-c6dd2b1a7467 : 1307101848 : "To Whom It ..."
12345 : data : dcbee495-6d5e-6ed48124632c : 1307103848 : "Hi, how are ..."
OR
12345-5fc38314-e290-ae5da5fc375d : data : : 1307097848 : "Hi Lars, ..."
12345-725aae5f-d72e-f90f3f070419 : data : : 1307099848 : "Welcome, and ..."
12345-cc6775b3-f249-c6dd2b1a7467 : data : : 1307101848 : "To Whom It ..."
12345-dcbee495-6d5e-6ed48124632c : data : : 1307103848 : "Hi, how are ..."
Same Storage Requirements
6. Secondary Indexes
Although HBase has no native support for secondary indexes, there are
use cases that need them. The requirements are usually that can look
up a cell with not just the primary coordinates—the row key, column
family name, and qualifier—but also an alternative coordinate. In
addition, it can scan a range of rows from the main table, but ordered
by the secondary index.
• Client-managed
• Indexed-Transactional HBase
• Indexed HBase
7. Coprocessors
• Think of this as a small MapReduce framework that distributes
work across the entire cluster.
• A coprocessor enables to run arbitrary code directly on each
region server.
• It executes the code on a per-region basis, giving trigger-like
functionality
8. Zookeeper
• An open source server that reliably coordinates distributed
processes.
• Apache ZooKeeper provides operational services for a Hadoop
cluster.
• ZooKeeper provides a distributed configuration service, a
synchronization service and a naming registry for distributed
systems.
• Distributed applications use ZooKeeper to store and mediate
updates to important configuration information.
9. Zookeeper Service : Data Model
• Znode
– In-memory data node in the Zookeeper data
– Have a hierarchical namespace
– UNIX like notation for path
• Types of Znode
– Persistent
– Ephemeral
• Flags of Znode
– Sequential numbers
11. The ZooKeeper service can run in two modes.
• In standalone mode, there is a single ZooKeeper server, which is
useful for testing due to its simplicity (it can even be embedded in
unit tests), but provides no guarantees of high-availability or
resilience.
• In production, ZooKeeper runs in replicated mode, on a cluster of
machines called an ensemble. ZooKeeper achieves high-availability
through replication, and can provide a service as long as a majority of
the machines in the ensemble are up.
Zookeeper Service: Implementation
13. Zookeeper Service: Sessions
• A ZooKeeper client is configured with the list of servers in the ensemble.
On startup, it tries to connect to one of the servers in the list.
• Once a connection has been made with a ZooKeeper server, the server
creates a new session for the client.
• Sessions are kept alive by the client sending ping requests (also known as
heartbeats) whenever the session is idle for longer than a certain period.