This presentation will give an overview of two exciting releases for Apache HBase and Phoenix. HBase 2.0 is the next stable major release for Apache HBase scheduled for early 2018. It is the next evolution from the Apache HBase community after 1.0. HBase-2.0 contains a large number of features that are long time in development, some of which include rewritten region assignment , perf improvements (RPC, rewritten write pipeline, etc), async clients and WAL, C++ client, offheaping memstore and other buffers, shading of dependencies, as well as a lot of other fixes and stability improvements. We will go into technical details on some of the most important improvements in the release, as well as what are the implications for the users in terms of API and upgrade paths. Phoenix 5.0 is the next biggest release because of its integration with HBase 2.0 and lot of performance improvements in support of secondary Indexes. It has a lot of cool features such as encoded columns, Kafka, Hive integration, and many other performance improvements.
We have presented this at Data work summit 2018 in San Jose.
it may be possible that you will use less physical resources on latest releases than the old ones because of performance benefits.
from personal opinion, always better to use last release with a patch no., as it is expected to be more stable, if you are not in hurry of getting new features
Dependency Compatibility: An upgrade of HBase will not require an incompatible upgrade of a dependent project, including the Java runtime.
Operational Compatibility: Metric changes, Behavioral changes of services, JMX APIs exposed via the /jmx/ endpoint
Client Binary compatibility: Client code written using APIs doesnot recompilation
Rolling upgrade you can’t run DDL with old client
Each thread does 100 row multigets 100k, 10M rows
Offheal size : 105 GB
Single node cluster
Througput 41k gets/second
25micro second per get
ByteBuffer classes are extended to optimize the storage of offset and length
NIO direct byte buffers are performed better than netty byte buffer in JMH comparison
Cell class is extended with new methods for accessing ByteBuffer from offheap directly without copying , these methods are kept private and has no change from public interface for the cell.
Some decision like using abstract class instead of interface for ByteBufferCell was decided on performance benchmarks because of Java L2 compiler inlining which was happening with interface but happening with abstract class.
Graph was bad because of GC pauses
To optimize java socket implementation during reading RPC requests where the data is temp copy from onheap to offheap buffer before copying to destination on heap buffer, with offheap buffer pools we can skip this temp copy and read RPC requests directly in offheap buffer.
MSLAB pooling allocates fixed size byte buffer similar to traditional MSLAB pooling of onheap byte array to avoid fragmentation of memory. We create a pooll of fixed size buffers , and cells are stored in pipeline of these buffers ,
MS
CSLM is concurrentSkipListMap (it is a memstore data structure). To avoid heap usage by CSLM metadata, New Data structure is used to keep the cells in flattened array.
Purpose of inmemory flushes is to have data stay more in memory for faster retrievel. Data structure is optimized to reduce index overhead of CSLM
ComplimentsTephra
ACID properties are Hard to scale when databases have to deal with very large amounts of data and thousands of concurrent users because the data must be partitioned, distributed and replicated.
OMID provides lock-free transactional support on top of Hbase with snapshot isolation guarantees.