Meet HBase 2.0 and Phoenix 5.0

1 © Hortonworks Inc. 2011–2018. All rights reserved
Meet HBase 2.0 and Phoenix 5.0
Ankit Singhal and Rajeshbabu Chintaguntla
HBase & Phoenix Engineers in Hortonworks

About Us
Ankit Singhal
- Phoenix PMC and HBase contributor
- HBase dev @Hortonworks
Rajeshbabu Chintaguntla
- Phoenix PMC and HBase Committer
- HBase dev @Hortonworks

Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix- 5.0

Outline
Recent releases
Changes in behavior
Phoenix-5.0
Picture taken from: https://www.ostraining.com/blog/joomla/content-versioning-a-sneak-peek-at-joomla-s-newest-proposed-feature/

2018 (repo and releases)
(master) 3.0.0-SNAPSHOT
(branch-1) 1.5.0-SNAPSHOT
(branch-1.4)
1.4.3
1.3.21.3.1
(branch-1.3)
1.2.0 1.2.6
1.1.0 1.1.13
(branch-1.2)
(branch-1.1)
(branch-2) 2.1.0-SNAPSHOT
1.4.4
2.0.-beta-1-2 2.0.0 2.1.0
(expected)
3.0.0
1.4.5

How to choose a release
• If you are on 0.98.x or earlier, time to upgrade!
• 1.0 is EOL’ed. Use 1.3.x at least
• Both 1.3.2 and 1.4.4 are pretty stable
• Moving between minor versions is easy for 1.x
• Starting from scratch, use 1.4.4 or 2.0.1(RC)/2.1.0(planned)
Image is taken from:http://meritcd.com/blogs/improve-your-decision-making-improve-your-leadership-2/

Outline
Recent releases
Changes in behavior
Phoenix-5.0

Semantic Versioning
Starting with the 1.0 release, HBase works toward Semantic Versioning
MAJOR.MINOR.PATCH[-identifiers]
PATCH: only BC bug fixes.
MINOR: BC new features
MAJOR: Incompatible changes

To be, or not to be (Compatible)

To be, or not to be (Compatible)
• Compatibility is NOT a simple yes or no
• Many dimensions
• source, binary, wire, command line, dependencies etc
• Read https://hbase.apache.org/book.html#upgrading

Major Minor Patch
Client-Server Wire Compatibility ✗ ✓ ✓
Server-Server Compatibility ✗ ✓ ✓
File Format Compatibility ✗* ✓ ✓
Client API Compatibility ✗ ✓ ✓
Client Binary Compatibility ✗ ✗ ✓
Server Side Limited API Compatibility ✗ ✗*/✓* ✓
Dependency Compatibility ✗ ✓ ✓
Operation Compatibility ✗ ✗ ✓
Compatibilities

HBase API surface
• Client API (For Application Developer)
• Explicitly marked with InterfaceAudience.Public
• Get/Put/Table/Connection, etc
• LimitedPrivate API (For Advanced user)
• Explicitly marked with InterfaceAudience.LimitedPrivate
• Coprocessors, replication APIs
• Private API (For HBase Internal use)
• Explicitly marked with InterfaceAudience.Private
• All other classes not marked
• Also InterfaceAudience.{Stable,Evolving,Unstable}

Outline
Recent releases
Changes in behavior
Phoenix-5.0

Changes in behavior – HBase-2.0
• JDK-8+ only
• Hadoop-2.7+ (Hadoop-3 support)
• Filter and Coprocessor changes
• Managed connections are gone.
• “Rolling upgradable not supported” from 1.x
• Some DDL / Admin operations may not work with the old client
• Updated dependencies
• Deprecated client interfaces are gone
https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_ktczrlKHK8N4SZzs/edit#heading=h.v21r9nz8g01j

HBase 0.98 Name(s) HBase 1.x + 2.x name(s)
HConnectionManager,
ConnectionManager
ConnectionFactory
HConnection, ClusterConnection Connection
HBaseAdmin Admin
HTable(HTableInterface) Table
RegionLocator
HTableDescriptor, HColumnDescriptor,
HRegionInfo
TableDescriptor,
ColumnFamilyDescriptor,
RegionInfo
API Changes

Outline
Recent releases
Changes in behavior
Phoenix-5.0

Offheaping Read Path
• Goals
• Make use of all available RAM
• Reduce temporary garbage for reading
• Make bucket cache as fast as LRU cache
• HBase block cache can be configured as L1 / L2
• “Bucket cache” can be backed by onheap, off-heap or SSD
• Different sized buckets in 4MB slices
• HFile blocks placed in different buckets
• Earlier Cell comparison only used to deal with byte[] and copy the whole block to heap

https://www.slideshare.net/HBaseCon/offheaping-the-apache-hbase-read-path

https://blogs.apache.org/hbase/entry/offheaping_the_read_path_in1

Offheap Read-Path in Production - The Alibaba story (https://blogs.apache.org/hbase/)

Offheaping Write Path
• Reduce GC, improve throughput predictability
• Use off-heap buffer pools to read RPC requests
• Remove garbage created in Codec parsing, MSLAB copying, etc
• MSLAB is used for heap fragmentation
• Now MSLAB can allocate offheap pooled buffers [2MB]

Offheaping Write Path
• Allows bigger total memstore space
• Cell data goes off-heap
• Cell metadata (and CSLM#Entry, etc) is on-heap (~100 byte overhead)
• Async WAL is used to avoid copying Cells into heap
• Works with new “Compacting Memstore”

Compacting Memstore
• In memory flushes
• Reduce index overhead per cell
• Gains proportional to cell size
• In memory compaction
• Eliminate duplicates
• Gains are proportional to duplicates
• Reduce flushes to disk
• Reduce file creation
• Reduce write amplification

Compacting Memstore
https://www.slideshare.net/HBaseCon/apache-hbase-accelerated-inmemory-flush-and-compaction

26

https://www.slideshare.net/HBaseCon/apache-hbase-accelerated-inmemory-flush-and-compaction

Async Client
• A complete new client maintained in HBase
• Different than OpenTSDB/asynchbase
• Fully async end to end
• Based on Netty and JDK-8 CompletableFutures
• Includes most critical functionality including Admin / DDL
• Some advanced functionality (read replicas, etc) are not done yet
• Have both sync and async client.

Async Client

Region Assignment
• Region assignment is one of the pain points
• Complete overhaul of region assignment
• Based on the internal “Procedure V2” framework
• Series of operations are broken down into states in State Machine
• State’s are serialized to durable storage
• Uses hbase:meta to store the state of region assignments
• States of DDLs are stored in WAL on HDFS
• Backup master can recover the state from where left off

Backup / Restore
• Need a reliable native Backup mechanism

Backup / Restore
• Full backups and Incremental backups
• Backup destination to a remote FS
• HDFS, S3, ADLS, WASB, and any Hadoop-compatible FS
• Full backup based on “snapshots”
• Incremental backup ships Write-Ahead-Logs
• Backup + Replication are complimentary (for a full DR solution)

Backup / Restore
• Flexible restore options

FileSystem Quotas
• HBase supports quotas
• Request count or size per sec,min,hour,day (per server)
• Num tables / num regions in namespace
• New work enables quotas for disk space usage
• Master periodically gathers info about disk space usage
• Regionservers enforce limit when notified from master
• Limits per table or per namespace (not per user)
• Violation policy: {DISABLE, NO_WRITES_COMPACTIONS, NO_WRITES, NO_INSERTS}

FileSystem Quotas
• Much more complex when snapshots are in the picture (similar to quotas with hard
links in FS)
• Snapshot usage is counted against the originating table
• A File is only counted once (even original table, or snapshot or restored table refers to
it)
• More info: HBASE-16961 => ‘prod_website’, LIMIT => ‘7125GB’
hbase> set_quota TYPE => SIZE, VIOLATION_POLICY
=>DELETES_WITH_COMPACTIONS, NAMESPACE =>
‘prod_website’, LIMIT => ‘7125GB’

Will be released for the first time
• These will be released for the first time in Apache (although other backports exist)
• Medium Object Support
• Region Server groups
• Favored Nodes enhancements
• HBase-Spark module
• WAL’s and data files in separate FileSystems

Outline
Recent releases
Changes in behavior
Phoenix- 5.0

Phoenix 5.0 - Outline
• Phoenix releases
• Index Improvements.
• Join Improvements.
• Major Features

2018 (repo and releases)
(master) 4.15.0-SNAPSHOT
(5.x-HBase-2.0) 5.0.0-SNAPSHOT
5.0.0
4.15.04.14.0
5.0.0-alpha-1
4.13.0
5.x will become master after 5.0 release

Join Improvements

Cost based Optimizations for Joins
• Phoenix support two Join algorithms
• Hash Join
• Sort Merge Join.
• Prior to 4.14.0 user need to explicitly Hint compiler to use Sort Merge Join when
required.
• From 4.14.0 onwards best join algorithm selection happen automatically based on
query join predicate and ordering

• Sort merge join wins over Hash Join when both the children are ordered.
• Hash-join wins over sort-merge-join w/ smaller side ordered.
• Hash-join wins over sort-merge-join w/o any side ordered.
• Hash-join wins over sort-merge-join w/ both sides ordered in an order-by query – client
need to sort merge the results.
• Multi-table join: a mix of join strategies chosen based on cost.

Index Improvements

Index Write Failure Handling
• Phoenix supports many strategies for handling index failures.
• Blocking writes to data table until index restored to consistent state.
• Disable index with timestamp at which failure occurred and rebuild index in the background until
consistent state reached.
• Disable index and manual index rebuilding from MR tool to avoid overhead at server.
• Kill the RS so that the index updates replayed or recovered to consistent state.

Index Write Failure Handling (Cont’d..)
Client Retries On Index Failure
• Prior to Phoenix 5.0.0 the retry happen at server on index write failures and disable
index even if the write not succeeded after multiple retries.
• This adds unnecessary overhead to server and block the handlers.
• Now the client handles the retries without any blocking.

Incremental Index Rebuilding
• When data accumulated to rebuild is more(hours or days of data) after index failures
then it’s difficult to rebuild the indexes in single pass.
• So in latest version the index rebuilding happen in batches with check pointing by
timestamp so that any failures happen in the middle rebuilding starts from last
successful checkpoint.
• Automatic rebuilding stops when user initiates manual index rebuilding.

MR based Index Rebuilding
• Alter index with REBUILD ASYNC option.
• Use IndexTool to rebuild the index.

Global Index performance Improvement
• Presplit index table before asynchronous index building.
• Until unless we know about the indexed column value data set it’s difficult to pre define the
boundaries of index table.
• Automatic process has implemented in 4.14.0 version where tool identifies the approximate split
keys by collecting the index key values from data table sample(or full table) where the sampling
rate can be changed from 0 to 100%.
• The tool builds Equi-depth histogram to identify the approximate index region boundaries.

Local Index performance Improvements
• As of now when we scan through local index we need to touch base all the regions
irrespective of filters.
• In the current version we can filter the regions to scan for local indexes based on data
table leading primary key filter conditions.

Kafka Integration

Apache Kafka Integration
• Apache Kafka plugin helps to reliably and efficiently stream
large amounts of data onto HBase using Phoenix API. As part
of this plugin we are providing PhoenixConsumer to receive
messages from KafkaProducer.

Major Features

Support for GRANT and REVOKE commands
• Internally uses HBase ACLs only, possible permissions are R - Read, W - Write, X -
Execute, C - Create and A - Admin.
• Access needs to be given on data table which automatically get propagated to all its
indexes and views
• No need to have WRITE access on SYSTEM.CATALOG table now for DDLs
• New Indexes will be given same permission as data table.

Query log
• Run time information of the query is logged in the Phoenix table(SYSTEM.LOG)
• There are around 40+ columns of information to help in analzing the query at any point
of time.
• Logging is done Asynchronously and can be controlled through log
level(TRACE,DEBUG,INFO,OFF)

Python driver for Phoenix Query Server
• It implements the Python DB 2.0 API to access Phoenix via the Phoenix Query Server
• Example

Persistent Subquery Results Cache - WIP
• Persisting expensive subqueries data beyond life of query helps in significant
performance gain when the same subquery repeated in multiple queries.
• Make use of coprocessor based server cache to avoid caching at client.
• WIP at https://issues.apache.org/jira/browse/PHOENIX-4666

Upgrades/New
• Apache Omid Integration for Transactions Support( along with Tephra)
• Hive 3.0.0
• Hadoop 3.0.0
• Spark 2.3.0
• Tephra 0.14.0
• Avatica 0.11.0

Questions?

Thank you

Meet HBase 2.0 and Phoenix 5.0

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Meet HBase 2.0 and Phoenix 5.0

Similaire à Meet HBase 2.0 and Phoenix 5.0 (20)

Dernier

Dernier (20)

Meet HBase 2.0 and Phoenix 5.0

Notes de l'éditeur