More Related Content Similar to Meet HBase 2.0 and Phoenix-5.0 (20) More from DataWorks Summit (20) Meet HBase 2.0 and Phoenix-5.01. 1 © Hortonworks Inc. 2011–2018. All rights reserved
Meet HBase 2.0 and Phoenix 5.0
Ankit Singhal and Rajeshbabu Chintaguntla
HBase & Phoenix Engineer in Hortonworks
2. 2 © Hortonworks Inc. 2011–2018. All rights reserved
About Us
Ankit Singhal
- Phoenix PMC
- HBase dev @Hortonworks
Rajeshbabu Chintaguntla
- Phoenix PMC and HBase Committer
- HBase dev @Hortonworks
3. 3 © Hortonworks Inc. 2011–2018. All rights reserved
Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix- 5.0
4. 4 © Hortonworks Inc. 2011–2018. All rights reserved
Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix-5.0
Picture taken from: https://www.ostraining.com/blog/joomla/content-versioning-a-sneak-peek-at-joomla-s-newest-proposed-feature/
5. 5 © Hortonworks Inc. 2011–2018. All rights reserved
2018 (repo and releases)
(master) 3.0.0-SNAPSHOT
(branch-1) 1.5.0-SNAPSHOT
(branch-1.4)
1.4.3
1.3.21.3.1
(branch-1.3)
1.2.0 1.2.6
1.1.0 1.1.13
(branch-1.2)
(branch-1.1)
(branch-2) 2.1.0-SNAPSHOT
1.4.4
2.0.-beta-1-2 2.0.0 2.1.0
(expected)
3.0.0
1.4.5
6. 6 © Hortonworks Inc. 2011–2018. All rights reserved
How to choose a release
• If you are on 0.98.x or earlier, time to upgrade!
• 1.0 is EOL’ed. Use 1.3.x at least
• Both 1.3.2 and 1.4.4 are pretty stable
• Moving between minor versions is easy for 1.x
• Starting from scratch, use 1.4.4 or 2.0.1(RC)/2.1.0(planned)
Picture taken from:http://meritcd.com/blogs/improve-your-decision-making-improve-your-leadership-2/
7. 7 © Hortonworks Inc. 2011–2018. All rights reserved
Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix-5.0
8. 8 © Hortonworks Inc. 2011–2018. All rights reserved
Semantic Versioning
Starting with the 1.0 release, HBase works toward Semantic Versioning
MAJOR.MINOR.PATCH[-identifiers]
PATCH: only BC bug fixes.
MINOR: BC new features
MAJOR: Incompatible changes
9. 9 © Hortonworks Inc. 2011–2018. All rights reserved
To be, or not to be (Compatible)
10. 10 © Hortonworks Inc. 2011–2018. All rights reserved
To be, or not to be (Compatible)
• Compatibility is NOT a simple yes or no
• Many dimensions
• source, binary, wire, command line, dependencies etc
• Read https://hbase.apache.org/book.html#upgrading
11. 11 © Hortonworks Inc. 2011–2018. All rights reserved
HBase API surface
• Client API
• Explicitly marked with InterfaceAudience.Public
• Get/Put/Table/Connection, etc
• LimitedPrivate API
• Explicitly marked with InterfaceAudience.LimitedPrivate
• Coprocessors, replication APIs
• Private API
• Explicitly marked with InterfaceAudience.Private
• All other classes not marked
• Also InterfaceAudience.{Stable,Evolving,Unstable}
12. 12 © Hortonworks Inc. 2011–2018. All rights reserved
Major Minor Patch
Client-Server Wire Compatibility ✗ ✓ ✓
Server-Server Compatibility ✗ ✓ ✓
File Format Compatibility ✗* ✓ ✓
Client API Compatibility ✗ ✓ ✓
Client Binary Compatibility ✗ ✗ ✓
Server Side Limited API Compatibility ✗ ✗*/✓* ✓
Dependency Compatibility ✗ ✓ ✓
Operation Compatibility ✗ ✗ ✓
Compatibilities
13. 13 © Hortonworks Inc. 2011–2018. All rights reserved
Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix-5.0
14. 14 © Hortonworks Inc. 2011–2018. All rights reserved
Changes in behavior – HBase-2.0
• JDK-8+ only
• Hadoop-2.7+ and Hadoop-3 support
• Filter and Coprocessor changes
• Managed connections are gone.
• “Rolling upgradable not supported” from 1.x
• Updated dependencies
• Deprecated client interfaces are gone
https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_ktczrlKHK8N4SZzs/edit#heading=h.v21r9nz8g01j
15. 15 © Hortonworks Inc. 2011–2018. All rights reserved
How to migrate on HBase-2.0
• 2.0 contains more API clean up
• Cleanup PB and guava “leaks” into the API
• Some deprecated APIs (HConnection, HTable, HBaseAdmin, etc) going away
• Start using JDK-8 (and G1). You will like it.
• 1.x client should be able to do read / write / scan against 2.0 clusters
• Some DDL / Admin operations may not work during Rolling upgrade
16. 16 © Hortonworks Inc. 2011–2018. All rights reserved
Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix-5.0
17. 17 © Hortonworks Inc. 2011–2018. All rights reserved
Offheaping Read Path
• Goals
• Make use of all available RAM
• Reduce temporary garbage for reading
• Make bucket cache as fast as LRU cache
• HBase block cache can be configured as L1 / L2
• “Bucket cache” can be backed by onheap, off-heap or SSD
• Different sized buckets in 4MB slices
• HFile blocks placed in different buckets
• Cell comparison only used to deal with byte[]
• Copy the whole block to heap
18. 18 © Hortonworks Inc. 2011–2018. All rights reserved
https://www.slideshare.net/HBaseCon/offheaping-the-apache-hbase-read-path
19. 19 © Hortonworks Inc. 2011–2018. All rights reserved
https://www.slideshare.net/HBaseCon/offheaping-the-apache-hbase-read-path
20. 20 © Hortonworks Inc. 2011–2018. All rights reserved
Offheaping Read Path
https://blogs.apache.org/hbase/entry/offheaping_the_read_path_in1
21. 21 © Hortonworks Inc. 2011–2018. All rights reserved
Offheaping Read Path
Offheap Read-Path in Production - The Alibaba story (https://blogs.apache.org/hbase/)
22. 22 © Hortonworks Inc. 2011–2018. All rights reserved
Offheaping Write Path
• Reduce GC, improve throughput predictability
• Use off-heap buffer pools to read RPC requests
• Remove garbage created in Codec parsing, MSLAB copying, etc
• MSLAB is used for heap fragmentation
• Now MSLAB can allocate offheap pooled buffers [2MB]
23. 23 © Hortonworks Inc. 2011–2018. All rights reserved
Offheaping Write Path
• Allows bigger total memstore space
• Cell data goes off-heap
• Cell metadata (and CSLM#Entry, etc) is on-heap (~100 byte overhead)
• Async WAL is used to avoid copying Cells into heap
• Works with new “Compacting Memstore”
24. 24 © Hortonworks Inc. 2011–2018. All rights reserved
Compacting Memstore
• In memory flushes
• Reduce index overhead per cell
• Gains proportional to cell size
• In memory compaction
• Eliminate duplicates
• Gains are proportional to duplicates
• Reduce flushes to disk
• Reduce file creation
• Reduce write amplification
25. 25 © Hortonworks Inc. 2011–2018. All rights reserved
Compacting Memstore
https://www.slideshare.net/HBaseCon/apache-hbase-accelerated-inmemory-flush-and-compaction
26. 26 © Hortonworks Inc. 2011–2018. All rights reserved
https://www.slideshare.net/HBaseCon/apache-hbase-accelerated-inmemory-flush-and-compaction
27. 27 © Hortonworks Inc. 2011–2018. All rights reserved
Async Client
• A complete new client maintained in HBase
• Different than OpenTSDB/asynchbase
• Fully async end to end
• Based on Netty and JDK-8 CompletableFutures
• Includes most critical functionality including Admin / DDL
• Some advanced functionality (read replicas, etc) are not done yet
• Have both sync and async client.
29. 29 © Hortonworks Inc. 2011–2018. All rights reserved
Region Assignment
• Region assignment is one of the pain points
• Complete overhaul of region assignment
• Based on the internal “Procedure V2” framework
• Series of operations are broken down into states in State Machine
• State’s are serialized to durable storage
• Uses a WAL on HDFS
• Backup master can recover the state from where left off
30. 30 © Hortonworks Inc. 2011–2018. All rights reserved
Backup / Restore
• Need a reliable native Backup mechanism
31. 31 © Hortonworks Inc. 2011–2018. All rights reserved
Backup / Restore
• Full backups and Incremental backups
• Backup destination to a remote FS
• HDFS, S3, ADLS, WASB, and any Hadoop-compatible FS
• Full backup based on “snapshots”
• Incremental backup ships Write-Ahead-Logs
• Backup + Replication are complimentary (for a full DR solution)
32. 32 © Hortonworks Inc. 2011–2018. All rights reserved
Backup / Restore
• Flexible restore options
33. 33 © Hortonworks Inc. 2011–2018. All rights reserved
FileSystem Quotas
• HBase supports quotas
• Request count or size per sec,min,hour,day (per server)
• Num tables / num regions in namespace
• New work enables quotas for disk space usage
• Master periodically gathers info about disk space usage
• Regionservers enforce limit when notified from master
• Limits per table or per namespace (not per user)
• Violation policy: {DISABLE, NO_WRITES_COMPACTIONS, NO_WRITES, NO_INSERTS}
34. 34 © Hortonworks Inc. 2011–2018. All rights reserved
FileSystem Quotas
• Much more complex when snapshots are in the picture (similar to quotas with hard
links in FS)
• Snapshot usage is counted against the originating table
• A File is only counted once (even original table, or snapshot or restored table refers to
it)
• More info: HBASE-16961 => ‘prod_website’, LIMIT => ‘7125GB’
hbase> set_quota TYPE => SIZE, VIOLATION_POLICY
=>DELETES_WITH_COMPACTIONS, NAMESPACE =>
‘prod_website’, LIMIT => ‘7125GB’
35. 35 © Hortonworks Inc. 2011–2018. All rights reserved
Will be released for the first time
• These will be released for the first time in Apache (although other backports exist)
• Medium Object Support
• Region Server groups
• Favored Nodes enhancements
• HBase-Spark module
• WAL’s and data files in separate FileSystems
36. 36 © Hortonworks Inc. 2011–2018. All rights reserved
Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix- 5.0
37. 37 © Hortonworks Inc. 2011–2018. All rights reserved
Phoenix 5.0 - Outline
• API cleanup
• Index Improvements.
• Join Improvements.
• Major Features
38. 38 © Hortonworks Inc. 2011–2018. All rights reserved
HBase 0.98 Name(s) HBase 1.x + 2.x name(s)
HConnectionManager,
ConnectionManager
ConnectionFactory
HConnection, ClusterConnection Connection
HBaseAdmin Admin
HTable(HTableInterface) Table
RegionLocator
HTableDescriptor, HColumnDescriptor,
HRegionInfo
TableDescriptor,
ColumnFamilyDescriptor,
RegionInfo
40. 40 © Hortonworks Inc. 2011–2018. All rights reserved
Cost based Optimizations for Joins
• Phoenix support two Join algorithms
• Hash Join
• Sort Merge Join.
• Prior to 4.14.0 user need to explicitly Hint compiler to use Sort Merge Join when
required.
• From 4.14.0 onwards best join algorithm selection happen automatically based on
query join predicate and ordering
41. 41 © Hortonworks Inc. 2011–2018. All rights reserved
• Sort merge join wins over Hash Join when both the children are ordered.
• Hash-join wins over sort-merge-join w/ smaller side ordered.
• Hash-join wins over sort-merge-join w/o any side ordered.
• Hash-join wins over sort-merge-join w/ both sides ordered in an order-by query – client
need to sort merge the results.
• Multi-table join: a mix of join strategies chosen based on cost.
43. 43 © Hortonworks Inc. 2011–2018. All rights reserved
Index Write Failure Handling
• Phoenix supports many strategies for handling index failures.
• Blocking writes to data table until index restored to consistent state.
• Disable index with timestamp at which failure occurred and rebuild index in the background until
consistent state reached.
• Disable index and manual index rebuilding from MR tool to avoid overhead at server.
• Kill the RS so that the index updates replayed or recovered to consistent state.
44. 44 © Hortonworks Inc. 2011–2018. All rights reserved
Index Write Failure Handling (Cont’d..)
Client Retries On Index Failure
• Prior to Phoenix 5.0.0 the retry happen at server on index write failures and disable
index even if the write not succeeded after multiple retries.
• This adds unnecessary overhead to server and block the handlers.
• Now the client handles the retries without any blocking.
45. 45 © Hortonworks Inc. 2011–2018. All rights reserved
Incremental Index Rebuilding
• When data accumulated to rebuild is more(hours or days of data) after index failures
then it’s difficult to rebuild the indexes in single pass.
• So in latest version the index rebuilding happen in batches with check pointing by
timestamp so that any failures happen in the middle rebuilding starts from last
successful checkpoint.
• Automatic rebuilding stops when user initiates manual index rebuilding.
46. 46 © Hortonworks Inc. 2011–2018. All rights reserved
MR based Index Rebuilding
• Alter index with REBUILD ASYNC option.
• Use IndexTool to rebuild the index.
47. 47 © Hortonworks Inc. 2011–2018. All rights reserved
Global Index performance Improvement
• Presplit index table before asynchronous index building.
• Until unless we know about the indexed column value data set it’s difficult to pre define the
boundaries of index table.
• Automatic process has implemented in 4.14.0 version where tool identifies the approximate split
keys by collecting the index key values from data table sample(or full table) where the sampling
rate can be changed from 0 to 100%.
• The tool builds Equi-depth histogram to identify the approximate index region boundaries.
48. 48 © Hortonworks Inc. 2011–2018. All rights reserved
Local Index performance Improvements
• As of now when we scan through local index we need to touch base all the regions
irrespective of filters.
• In the current version we can filter the regions to scan for local indexes based on data
table leading primary key filter conditions.
50. 50 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Kafka Integration
• Apache Kafka plugin helps to reliably and efficiently stream
large amounts of data onto HBase using Phoenix API. As part
of this plugin we are providing PhoenixConsumer to receive
messages from KafkaProducer.
52. 52 © Hortonworks Inc. 2011–2018. All rights reserved
Support for GRANT and REVOKE commands
• Uses HBase ACLs only, Same Permissions are R - Read, W - Write, X - Execute, C - Create
and A - Admin.
• Access needs to be given on data table which automatically get propagated to all its
indexes and views
• No need to have WRITE access on SYSTEM.CATALOG table now for DDLs
53. 53 © Hortonworks Inc. 2011–2018. All rights reserved
Query log
• Enable centralized recall and analyzation of queries executed against Phoenix.
• We have around 18+ metrics logged in the table
• Logging is done Asynchronously and can be controlled through log
level(TRACE,DEBUG,INFO,OFF)
54. 54 © Hortonworks Inc. 2011–2018. All rights reserved
Python driver for Phoenix Query Server
• It implements the Python DB 2.0 API to access Phoenix via the Phoenix Query Server
• Example
55. 55 © Hortonworks Inc. 2011–2018. All rights reserved
Persistent Subquery Results Cache - WIP
• Persisting expensive subqueries data beyond life of query helps in significant
performance gain when the same subquery repeated in multiple queries.
• Make use of coprocessor based server cache to avoid caching at client.
• WIP at https://issues.apache.org/jira/browse/PHOENIX-4666
56. 56 © Hortonworks Inc. 2011–2018. All rights reserved
Upgrades/New
• Apache Omid Integration for Transactions Support( along with Tephra)
• Hive 3.0
• Hadoop 3.0
• Spark 2.3.0
Editor's Notes it may be possible that you will use less physical resources on latest releases than the old ones because of performance benefits. from personal opinion, always better to use last release with a patch no., as it is expected to be more stable, if you are not in hurry of getting new features
Dependency Compatibility: An upgrade of HBase will not require an incompatible upgrade of a dependent project, including the Java runtime.
Operational Compatibility: Metric changes, Behavioral changes of services, JMX APIs exposed via the /jmx/ endpoint
Client Binary compatibility: Client code written to APIs doesnot recompilation
Rolling upgrade you can’t run DDL with old client
Each thread does 100 row multigets 1M times ByteBuffer classes are extended to optimize the storage of offset and length
NIO direct byte buffers are performed better than netty byte buffer in JMH comparison
Cell class is extended with new methods for accessing ByteBuffer from offheap directly without copying , these methods are kept private and has no change from public interface for the cell.
Some decision like using abstract class instead of interface for ByteBufferCell was decided on performance benchmarks because of Java L2 compiler inlining which was happening with interface but happening with abstract class.
Graph was bad because of GC
To optimize java socket implementation during reading RPC requests where the data is temp copy from onheap to offheap buffer before copying to destination on heap buffer, with offheap buffer pools we can skip this temp copy and read RPC requests directly in offheap buffer.
MSLAB pooling allocates fixed size byte buffer similar to traditional MSLAB pooling of onheap byte array to avoid fragmentation of memory. We create a pull of fixed size buffers , and cells are stored in pipeline of these buffers ,
MS
CSLM is concurrentSkipListMap (it is a memstore data structure). To avoid heap usage by CSLM metadata, New Data structure is used to keep the cells in flattened array. Purpose of inmemory flushes is to have data stay more in memory for faster retrievel. Data structure is optimized to reduce index overhead of CSLM ComplimentsTephra
ACID properties are Hard to scale when databases have to deal with very large amounts of data and thousands of concurrent users because the data must be partitioned, distributed and replicated.
OMID provides lock-free transactional support on top of Hbase with snapshot isolation guarantees.