SlideShare a Scribd company logo
1 of 58
1 © Hortonworks Inc. 2011–2018. All rights reserved
Meet HBase 2.0 and Phoenix 5.0
Ankit Singhal and Rajeshbabu Chintaguntla
HBase & Phoenix Engineer in Hortonworks
2 © Hortonworks Inc. 2011–2018. All rights reserved
About Us
Ankit Singhal
- Phoenix PMC
- HBase dev @Hortonworks
Rajeshbabu Chintaguntla
- Phoenix PMC and HBase Committer
- HBase dev @Hortonworks
3 © Hortonworks Inc. 2011–2018. All rights reserved
Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix- 5.0
4 © Hortonworks Inc. 2011–2018. All rights reserved
Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix-5.0
Picture taken from: https://www.ostraining.com/blog/joomla/content-versioning-a-sneak-peek-at-joomla-s-newest-proposed-feature/
5 © Hortonworks Inc. 2011–2018. All rights reserved
2018 (repo and releases)
(master) 3.0.0-SNAPSHOT
(branch-1) 1.5.0-SNAPSHOT
(branch-1.4)
1.4.3
1.3.21.3.1
(branch-1.3)
1.2.0 1.2.6
1.1.0 1.1.13
(branch-1.2)
(branch-1.1)
(branch-2) 2.1.0-SNAPSHOT
1.4.4
2.0.-beta-1-2 2.0.0 2.1.0
(expected)
3.0.0
1.4.5
6 © Hortonworks Inc. 2011–2018. All rights reserved
How to choose a release
• If you are on 0.98.x or earlier, time to upgrade!
• 1.0 is EOL’ed. Use 1.3.x at least
• Both 1.3.2 and 1.4.4 are pretty stable
• Moving between minor versions is easy for 1.x
• Starting from scratch, use 1.4.4 or 2.0.1(RC)/2.1.0(planned)
Picture taken from:http://meritcd.com/blogs/improve-your-decision-making-improve-your-leadership-2/
7 © Hortonworks Inc. 2011–2018. All rights reserved
Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix-5.0
8 © Hortonworks Inc. 2011–2018. All rights reserved
Semantic Versioning
Starting with the 1.0 release, HBase works toward Semantic Versioning
MAJOR.MINOR.PATCH[-identifiers]
PATCH: only BC bug fixes.
MINOR: BC new features
MAJOR: Incompatible changes
9 © Hortonworks Inc. 2011–2018. All rights reserved
To be, or not to be (Compatible)
10 © Hortonworks Inc. 2011–2018. All rights reserved
To be, or not to be (Compatible)
• Compatibility is NOT a simple yes or no
• Many dimensions
• source, binary, wire, command line, dependencies etc
• Read https://hbase.apache.org/book.html#upgrading
11 © Hortonworks Inc. 2011–2018. All rights reserved
HBase API surface
• Client API
• Explicitly marked with InterfaceAudience.Public
• Get/Put/Table/Connection, etc
• LimitedPrivate API
• Explicitly marked with InterfaceAudience.LimitedPrivate
• Coprocessors, replication APIs
• Private API
• Explicitly marked with InterfaceAudience.Private
• All other classes not marked
• Also InterfaceAudience.{Stable,Evolving,Unstable}
12 © Hortonworks Inc. 2011–2018. All rights reserved
Major Minor Patch
Client-Server Wire Compatibility ✗ ✓ ✓
Server-Server Compatibility ✗ ✓ ✓
File Format Compatibility ✗* ✓ ✓
Client API Compatibility ✗ ✓ ✓
Client Binary Compatibility ✗ ✗ ✓
Server Side Limited API Compatibility ✗ ✗*/✓* ✓
Dependency Compatibility ✗ ✓ ✓
Operation Compatibility ✗ ✗ ✓
Compatibilities
13 © Hortonworks Inc. 2011–2018. All rights reserved
Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix-5.0
14 © Hortonworks Inc. 2011–2018. All rights reserved
Changes in behavior – HBase-2.0
• JDK-8+ only
• Hadoop-2.7+ and Hadoop-3 support
• Filter and Coprocessor changes
• Managed connections are gone.
• “Rolling upgradable not supported” from 1.x
• Updated dependencies
• Deprecated client interfaces are gone
https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_ktczrlKHK8N4SZzs/edit#heading=h.v21r9nz8g01j
15 © Hortonworks Inc. 2011–2018. All rights reserved
How to migrate on HBase-2.0
• 2.0 contains more API clean up
• Cleanup PB and guava “leaks” into the API
• Some deprecated APIs (HConnection, HTable, HBaseAdmin, etc) going away
• Start using JDK-8 (and G1). You will like it.
• 1.x client should be able to do read / write / scan against 2.0 clusters
• Some DDL / Admin operations may not work during Rolling upgrade
16 © Hortonworks Inc. 2011–2018. All rights reserved
Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix-5.0
17 © Hortonworks Inc. 2011–2018. All rights reserved
Offheaping Read Path
• Goals
• Make use of all available RAM
• Reduce temporary garbage for reading
• Make bucket cache as fast as LRU cache
• HBase block cache can be configured as L1 / L2
• “Bucket cache” can be backed by onheap, off-heap or SSD
• Different sized buckets in 4MB slices
• HFile blocks placed in different buckets
• Cell comparison only used to deal with byte[]
• Copy the whole block to heap
18 © Hortonworks Inc. 2011–2018. All rights reserved
https://www.slideshare.net/HBaseCon/offheaping-the-apache-hbase-read-path
19 © Hortonworks Inc. 2011–2018. All rights reserved
https://www.slideshare.net/HBaseCon/offheaping-the-apache-hbase-read-path
20 © Hortonworks Inc. 2011–2018. All rights reserved
Offheaping Read Path
https://blogs.apache.org/hbase/entry/offheaping_the_read_path_in1
21 © Hortonworks Inc. 2011–2018. All rights reserved
Offheaping Read Path
Offheap Read-Path in Production - The Alibaba story (https://blogs.apache.org/hbase/)
22 © Hortonworks Inc. 2011–2018. All rights reserved
Offheaping Write Path
• Reduce GC, improve throughput predictability
• Use off-heap buffer pools to read RPC requests
• Remove garbage created in Codec parsing, MSLAB copying, etc
• MSLAB is used for heap fragmentation
• Now MSLAB can allocate offheap pooled buffers [2MB]
23 © Hortonworks Inc. 2011–2018. All rights reserved
Offheaping Write Path
• Allows bigger total memstore space
• Cell data goes off-heap
• Cell metadata (and CSLM#Entry, etc) is on-heap (~100 byte overhead)
• Async WAL is used to avoid copying Cells into heap
• Works with new “Compacting Memstore”
24 © Hortonworks Inc. 2011–2018. All rights reserved
Compacting Memstore
• In memory flushes
• Reduce index overhead per cell
• Gains proportional to cell size
• In memory compaction
• Eliminate duplicates
• Gains are proportional to duplicates
• Reduce flushes to disk
• Reduce file creation
• Reduce write amplification
25 © Hortonworks Inc. 2011–2018. All rights reserved
Compacting Memstore
https://www.slideshare.net/HBaseCon/apache-hbase-accelerated-inmemory-flush-and-compaction
26 © Hortonworks Inc. 2011–2018. All rights reserved
https://www.slideshare.net/HBaseCon/apache-hbase-accelerated-inmemory-flush-and-compaction
27 © Hortonworks Inc. 2011–2018. All rights reserved
Async Client
• A complete new client maintained in HBase
• Different than OpenTSDB/asynchbase
• Fully async end to end
• Based on Netty and JDK-8 CompletableFutures
• Includes most critical functionality including Admin / DDL
• Some advanced functionality (read replicas, etc) are not done yet
• Have both sync and async client.
28 © Hortonworks Inc. 2011–2018. All rights reserved
Async Client
29 © Hortonworks Inc. 2011–2018. All rights reserved
Region Assignment
• Region assignment is one of the pain points
• Complete overhaul of region assignment
• Based on the internal “Procedure V2” framework
• Series of operations are broken down into states in State Machine
• State’s are serialized to durable storage
• Uses a WAL on HDFS
• Backup master can recover the state from where left off
30 © Hortonworks Inc. 2011–2018. All rights reserved
Backup / Restore
• Need a reliable native Backup mechanism
31 © Hortonworks Inc. 2011–2018. All rights reserved
Backup / Restore
• Full backups and Incremental backups
• Backup destination to a remote FS
• HDFS, S3, ADLS, WASB, and any Hadoop-compatible FS
• Full backup based on “snapshots”
• Incremental backup ships Write-Ahead-Logs
• Backup + Replication are complimentary (for a full DR solution)
32 © Hortonworks Inc. 2011–2018. All rights reserved
Backup / Restore
• Flexible restore options
33 © Hortonworks Inc. 2011–2018. All rights reserved
FileSystem Quotas
• HBase supports quotas
• Request count or size per sec,min,hour,day (per server)
• Num tables / num regions in namespace
• New work enables quotas for disk space usage
• Master periodically gathers info about disk space usage
• Regionservers enforce limit when notified from master
• Limits per table or per namespace (not per user)
• Violation policy: {DISABLE, NO_WRITES_COMPACTIONS, NO_WRITES, NO_INSERTS}
34 © Hortonworks Inc. 2011–2018. All rights reserved
FileSystem Quotas
• Much more complex when snapshots are in the picture (similar to quotas with hard
links in FS)
• Snapshot usage is counted against the originating table
• A File is only counted once (even original table, or snapshot or restored table refers to
it)
• More info: HBASE-16961 => ‘prod_website’, LIMIT => ‘7125GB’
hbase> set_quota TYPE => SIZE, VIOLATION_POLICY
=>DELETES_WITH_COMPACTIONS, NAMESPACE =>
‘prod_website’, LIMIT => ‘7125GB’
35 © Hortonworks Inc. 2011–2018. All rights reserved
Will be released for the first time
• These will be released for the first time in Apache (although other backports exist)
• Medium Object Support
• Region Server groups
• Favored Nodes enhancements
• HBase-Spark module
• WAL’s and data files in separate FileSystems
36 © Hortonworks Inc. 2011–2018. All rights reserved
Outline
Recent releases
Versioning / Compatibility
Changes in behavior
Major Features in HBase-2.0
Phoenix- 5.0
37 © Hortonworks Inc. 2011–2018. All rights reserved
Phoenix 5.0 - Outline
• API cleanup
• Index Improvements.
• Join Improvements.
• Major Features
38 © Hortonworks Inc. 2011–2018. All rights reserved
HBase 0.98 Name(s) HBase 1.x + 2.x name(s)
HConnectionManager,
ConnectionManager
ConnectionFactory
HConnection, ClusterConnection Connection
HBaseAdmin Admin
HTable(HTableInterface) Table
RegionLocator
HTableDescriptor, HColumnDescriptor,
HRegionInfo
TableDescriptor,
ColumnFamilyDescriptor,
RegionInfo
39 © Hortonworks Inc. 2011–2018. All rights reserved
Join Improvements
40 © Hortonworks Inc. 2011–2018. All rights reserved
Cost based Optimizations for Joins
• Phoenix support two Join algorithms
• Hash Join
• Sort Merge Join.
• Prior to 4.14.0 user need to explicitly Hint compiler to use Sort Merge Join when
required.
• From 4.14.0 onwards best join algorithm selection happen automatically based on
query join predicate and ordering
41 © Hortonworks Inc. 2011–2018. All rights reserved
• Sort merge join wins over Hash Join when both the children are ordered.
• Hash-join wins over sort-merge-join w/ smaller side ordered.
• Hash-join wins over sort-merge-join w/o any side ordered.
• Hash-join wins over sort-merge-join w/ both sides ordered in an order-by query – client
need to sort merge the results.
• Multi-table join: a mix of join strategies chosen based on cost.
42 © Hortonworks Inc. 2011–2018. All rights reserved
Index Improvements
43 © Hortonworks Inc. 2011–2018. All rights reserved
Index Write Failure Handling
• Phoenix supports many strategies for handling index failures.
• Blocking writes to data table until index restored to consistent state.
• Disable index with timestamp at which failure occurred and rebuild index in the background until
consistent state reached.
• Disable index and manual index rebuilding from MR tool to avoid overhead at server.
• Kill the RS so that the index updates replayed or recovered to consistent state.
44 © Hortonworks Inc. 2011–2018. All rights reserved
Index Write Failure Handling (Cont’d..)
Client Retries On Index Failure
• Prior to Phoenix 5.0.0 the retry happen at server on index write failures and disable
index even if the write not succeeded after multiple retries.
• This adds unnecessary overhead to server and block the handlers.
• Now the client handles the retries without any blocking.
45 © Hortonworks Inc. 2011–2018. All rights reserved
Incremental Index Rebuilding
• When data accumulated to rebuild is more(hours or days of data) after index failures
then it’s difficult to rebuild the indexes in single pass.
• So in latest version the index rebuilding happen in batches with check pointing by
timestamp so that any failures happen in the middle rebuilding starts from last
successful checkpoint.
• Automatic rebuilding stops when user initiates manual index rebuilding.
46 © Hortonworks Inc. 2011–2018. All rights reserved
MR based Index Rebuilding
• Alter index with REBUILD ASYNC option.
• Use IndexTool to rebuild the index.
47 © Hortonworks Inc. 2011–2018. All rights reserved
Global Index performance Improvement
• Presplit index table before asynchronous index building.
• Until unless we know about the indexed column value data set it’s difficult to pre define the
boundaries of index table.
• Automatic process has implemented in 4.14.0 version where tool identifies the approximate split
keys by collecting the index key values from data table sample(or full table) where the sampling
rate can be changed from 0 to 100%.
• The tool builds Equi-depth histogram to identify the approximate index region boundaries.
48 © Hortonworks Inc. 2011–2018. All rights reserved
Local Index performance Improvements
• As of now when we scan through local index we need to touch base all the regions
irrespective of filters.
• In the current version we can filter the regions to scan for local indexes based on data
table leading primary key filter conditions.
49 © Hortonworks Inc. 2011–2018. All rights reserved
Kafka Integration
50 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Kafka Integration
• Apache Kafka plugin helps to reliably and efficiently stream
large amounts of data onto HBase using Phoenix API. As part
of this plugin we are providing PhoenixConsumer to receive
messages from KafkaProducer.
51 © Hortonworks Inc. 2011–2018. All rights reserved
Major Features
52 © Hortonworks Inc. 2011–2018. All rights reserved
Support for GRANT and REVOKE commands
• Uses HBase ACLs only, Same Permissions are R - Read, W - Write, X - Execute, C - Create
and A - Admin.
• Access needs to be given on data table which automatically get propagated to all its
indexes and views
• No need to have WRITE access on SYSTEM.CATALOG table now for DDLs
53 © Hortonworks Inc. 2011–2018. All rights reserved
Query log
• Enable centralized recall and analyzation of queries executed against Phoenix.
• We have around 18+ metrics logged in the table
• Logging is done Asynchronously and can be controlled through log
level(TRACE,DEBUG,INFO,OFF)
54 © Hortonworks Inc. 2011–2018. All rights reserved
Python driver for Phoenix Query Server
• It implements the Python DB 2.0 API to access Phoenix via the Phoenix Query Server
• Example
55 © Hortonworks Inc. 2011–2018. All rights reserved
Persistent Subquery Results Cache - WIP
• Persisting expensive subqueries data beyond life of query helps in significant
performance gain when the same subquery repeated in multiple queries.
• Make use of coprocessor based server cache to avoid caching at client.
• WIP at https://issues.apache.org/jira/browse/PHOENIX-4666
56 © Hortonworks Inc. 2011–2018. All rights reserved
Upgrades/New
• Apache Omid Integration for Transactions Support( along with Tephra)
• Hive 3.0
• Hadoop 3.0
• Spark 2.3.0
57 © Hortonworks Inc. 2011–2018. All rights reserved
Questions?
58 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you

More Related Content

What's hot

Enabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integrationEnabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integration
DataWorks Summit
 

What's hot (20)

Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Accelerating query processing
Accelerating query processingAccelerating query processing
Accelerating query processing
 
Apache Hive on ACID
Apache Hive on ACIDApache Hive on ACID
Apache Hive on ACID
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
Database as a Service - Tutorial @ICDE 2010
Database as a Service - Tutorial @ICDE 2010Database as a Service - Tutorial @ICDE 2010
Database as a Service - Tutorial @ICDE 2010
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Apache Hive ACID Project
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Enabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integrationEnabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integration
 
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 

Similar to Meet HBase 2.0 and Phoenix-5.0

HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

Similar to Meet HBase 2.0 and Phoenix-5.0 (20)

Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Meet Apache HBase - 2.0
Meet Apache HBase - 2.0Meet Apache HBase - 2.0
Meet Apache HBase - 2.0
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanApache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
 
HBaseCon 2015: Meet HBase 1.0
HBaseCon 2015: Meet HBase 1.0HBaseCon 2015: Meet HBase 1.0
HBaseCon 2015: Meet HBase 1.0
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Meet HBase 2.0 and Phoenix-5.0

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Meet HBase 2.0 and Phoenix 5.0 Ankit Singhal and Rajeshbabu Chintaguntla HBase & Phoenix Engineer in Hortonworks
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved About Us Ankit Singhal - Phoenix PMC - HBase dev @Hortonworks Rajeshbabu Chintaguntla - Phoenix PMC and HBase Committer - HBase dev @Hortonworks
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Outline Recent releases Versioning / Compatibility Changes in behavior Major Features in HBase-2.0 Phoenix- 5.0
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved Outline Recent releases Versioning / Compatibility Changes in behavior Major Features in HBase-2.0 Phoenix-5.0 Picture taken from: https://www.ostraining.com/blog/joomla/content-versioning-a-sneak-peek-at-joomla-s-newest-proposed-feature/
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved 2018 (repo and releases) (master) 3.0.0-SNAPSHOT (branch-1) 1.5.0-SNAPSHOT (branch-1.4) 1.4.3 1.3.21.3.1 (branch-1.3) 1.2.0 1.2.6 1.1.0 1.1.13 (branch-1.2) (branch-1.1) (branch-2) 2.1.0-SNAPSHOT 1.4.4 2.0.-beta-1-2 2.0.0 2.1.0 (expected) 3.0.0 1.4.5
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved How to choose a release • If you are on 0.98.x or earlier, time to upgrade! • 1.0 is EOL’ed. Use 1.3.x at least • Both 1.3.2 and 1.4.4 are pretty stable • Moving between minor versions is easy for 1.x • Starting from scratch, use 1.4.4 or 2.0.1(RC)/2.1.0(planned) Picture taken from:http://meritcd.com/blogs/improve-your-decision-making-improve-your-leadership-2/
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Outline Recent releases Versioning / Compatibility Changes in behavior Major Features in HBase-2.0 Phoenix-5.0
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Semantic Versioning Starting with the 1.0 release, HBase works toward Semantic Versioning MAJOR.MINOR.PATCH[-identifiers] PATCH: only BC bug fixes. MINOR: BC new features MAJOR: Incompatible changes
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved To be, or not to be (Compatible)
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved To be, or not to be (Compatible) • Compatibility is NOT a simple yes or no • Many dimensions • source, binary, wire, command line, dependencies etc • Read https://hbase.apache.org/book.html#upgrading
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved HBase API surface • Client API • Explicitly marked with InterfaceAudience.Public • Get/Put/Table/Connection, etc • LimitedPrivate API • Explicitly marked with InterfaceAudience.LimitedPrivate • Coprocessors, replication APIs • Private API • Explicitly marked with InterfaceAudience.Private • All other classes not marked • Also InterfaceAudience.{Stable,Evolving,Unstable}
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved Major Minor Patch Client-Server Wire Compatibility ✗ ✓ ✓ Server-Server Compatibility ✗ ✓ ✓ File Format Compatibility ✗* ✓ ✓ Client API Compatibility ✗ ✓ ✓ Client Binary Compatibility ✗ ✗ ✓ Server Side Limited API Compatibility ✗ ✗*/✓* ✓ Dependency Compatibility ✗ ✓ ✓ Operation Compatibility ✗ ✗ ✓ Compatibilities
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Outline Recent releases Versioning / Compatibility Changes in behavior Major Features in HBase-2.0 Phoenix-5.0
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Changes in behavior – HBase-2.0 • JDK-8+ only • Hadoop-2.7+ and Hadoop-3 support • Filter and Coprocessor changes • Managed connections are gone. • “Rolling upgradable not supported” from 1.x • Updated dependencies • Deprecated client interfaces are gone https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_ktczrlKHK8N4SZzs/edit#heading=h.v21r9nz8g01j
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved How to migrate on HBase-2.0 • 2.0 contains more API clean up • Cleanup PB and guava “leaks” into the API • Some deprecated APIs (HConnection, HTable, HBaseAdmin, etc) going away • Start using JDK-8 (and G1). You will like it. • 1.x client should be able to do read / write / scan against 2.0 clusters • Some DDL / Admin operations may not work during Rolling upgrade
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved Outline Recent releases Versioning / Compatibility Changes in behavior Major Features in HBase-2.0 Phoenix-5.0
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Offheaping Read Path • Goals • Make use of all available RAM • Reduce temporary garbage for reading • Make bucket cache as fast as LRU cache • HBase block cache can be configured as L1 / L2 • “Bucket cache” can be backed by onheap, off-heap or SSD • Different sized buckets in 4MB slices • HFile blocks placed in different buckets • Cell comparison only used to deal with byte[] • Copy the whole block to heap
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved https://www.slideshare.net/HBaseCon/offheaping-the-apache-hbase-read-path
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved https://www.slideshare.net/HBaseCon/offheaping-the-apache-hbase-read-path
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Offheaping Read Path https://blogs.apache.org/hbase/entry/offheaping_the_read_path_in1
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Offheaping Read Path Offheap Read-Path in Production - The Alibaba story (https://blogs.apache.org/hbase/)
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved Offheaping Write Path • Reduce GC, improve throughput predictability • Use off-heap buffer pools to read RPC requests • Remove garbage created in Codec parsing, MSLAB copying, etc • MSLAB is used for heap fragmentation • Now MSLAB can allocate offheap pooled buffers [2MB]
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Offheaping Write Path • Allows bigger total memstore space • Cell data goes off-heap • Cell metadata (and CSLM#Entry, etc) is on-heap (~100 byte overhead) • Async WAL is used to avoid copying Cells into heap • Works with new “Compacting Memstore”
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved Compacting Memstore • In memory flushes • Reduce index overhead per cell • Gains proportional to cell size • In memory compaction • Eliminate duplicates • Gains are proportional to duplicates • Reduce flushes to disk • Reduce file creation • Reduce write amplification
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved Compacting Memstore https://www.slideshare.net/HBaseCon/apache-hbase-accelerated-inmemory-flush-and-compaction
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved https://www.slideshare.net/HBaseCon/apache-hbase-accelerated-inmemory-flush-and-compaction
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved Async Client • A complete new client maintained in HBase • Different than OpenTSDB/asynchbase • Fully async end to end • Based on Netty and JDK-8 CompletableFutures • Includes most critical functionality including Admin / DDL • Some advanced functionality (read replicas, etc) are not done yet • Have both sync and async client.
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved Async Client
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved Region Assignment • Region assignment is one of the pain points • Complete overhaul of region assignment • Based on the internal “Procedure V2” framework • Series of operations are broken down into states in State Machine • State’s are serialized to durable storage • Uses a WAL on HDFS • Backup master can recover the state from where left off
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved Backup / Restore • Need a reliable native Backup mechanism
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved Backup / Restore • Full backups and Incremental backups • Backup destination to a remote FS • HDFS, S3, ADLS, WASB, and any Hadoop-compatible FS • Full backup based on “snapshots” • Incremental backup ships Write-Ahead-Logs • Backup + Replication are complimentary (for a full DR solution)
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved Backup / Restore • Flexible restore options
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved FileSystem Quotas • HBase supports quotas • Request count or size per sec,min,hour,day (per server) • Num tables / num regions in namespace • New work enables quotas for disk space usage • Master periodically gathers info about disk space usage • Regionservers enforce limit when notified from master • Limits per table or per namespace (not per user) • Violation policy: {DISABLE, NO_WRITES_COMPACTIONS, NO_WRITES, NO_INSERTS}
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved FileSystem Quotas • Much more complex when snapshots are in the picture (similar to quotas with hard links in FS) • Snapshot usage is counted against the originating table • A File is only counted once (even original table, or snapshot or restored table refers to it) • More info: HBASE-16961 => ‘prod_website’, LIMIT => ‘7125GB’ hbase> set_quota TYPE => SIZE, VIOLATION_POLICY =>DELETES_WITH_COMPACTIONS, NAMESPACE => ‘prod_website’, LIMIT => ‘7125GB’
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved Will be released for the first time • These will be released for the first time in Apache (although other backports exist) • Medium Object Support • Region Server groups • Favored Nodes enhancements • HBase-Spark module • WAL’s and data files in separate FileSystems
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved Outline Recent releases Versioning / Compatibility Changes in behavior Major Features in HBase-2.0 Phoenix- 5.0
  • 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved Phoenix 5.0 - Outline • API cleanup • Index Improvements. • Join Improvements. • Major Features
  • 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved HBase 0.98 Name(s) HBase 1.x + 2.x name(s) HConnectionManager, ConnectionManager ConnectionFactory HConnection, ClusterConnection Connection HBaseAdmin Admin HTable(HTableInterface) Table RegionLocator HTableDescriptor, HColumnDescriptor, HRegionInfo TableDescriptor, ColumnFamilyDescriptor, RegionInfo
  • 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved Join Improvements
  • 40. 40 © Hortonworks Inc. 2011–2018. All rights reserved Cost based Optimizations for Joins • Phoenix support two Join algorithms • Hash Join • Sort Merge Join. • Prior to 4.14.0 user need to explicitly Hint compiler to use Sort Merge Join when required. • From 4.14.0 onwards best join algorithm selection happen automatically based on query join predicate and ordering
  • 41. 41 © Hortonworks Inc. 2011–2018. All rights reserved • Sort merge join wins over Hash Join when both the children are ordered. • Hash-join wins over sort-merge-join w/ smaller side ordered. • Hash-join wins over sort-merge-join w/o any side ordered. • Hash-join wins over sort-merge-join w/ both sides ordered in an order-by query – client need to sort merge the results. • Multi-table join: a mix of join strategies chosen based on cost.
  • 42. 42 © Hortonworks Inc. 2011–2018. All rights reserved Index Improvements
  • 43. 43 © Hortonworks Inc. 2011–2018. All rights reserved Index Write Failure Handling • Phoenix supports many strategies for handling index failures. • Blocking writes to data table until index restored to consistent state. • Disable index with timestamp at which failure occurred and rebuild index in the background until consistent state reached. • Disable index and manual index rebuilding from MR tool to avoid overhead at server. • Kill the RS so that the index updates replayed or recovered to consistent state.
  • 44. 44 © Hortonworks Inc. 2011–2018. All rights reserved Index Write Failure Handling (Cont’d..) Client Retries On Index Failure • Prior to Phoenix 5.0.0 the retry happen at server on index write failures and disable index even if the write not succeeded after multiple retries. • This adds unnecessary overhead to server and block the handlers. • Now the client handles the retries without any blocking.
  • 45. 45 © Hortonworks Inc. 2011–2018. All rights reserved Incremental Index Rebuilding • When data accumulated to rebuild is more(hours or days of data) after index failures then it’s difficult to rebuild the indexes in single pass. • So in latest version the index rebuilding happen in batches with check pointing by timestamp so that any failures happen in the middle rebuilding starts from last successful checkpoint. • Automatic rebuilding stops when user initiates manual index rebuilding.
  • 46. 46 © Hortonworks Inc. 2011–2018. All rights reserved MR based Index Rebuilding • Alter index with REBUILD ASYNC option. • Use IndexTool to rebuild the index.
  • 47. 47 © Hortonworks Inc. 2011–2018. All rights reserved Global Index performance Improvement • Presplit index table before asynchronous index building. • Until unless we know about the indexed column value data set it’s difficult to pre define the boundaries of index table. • Automatic process has implemented in 4.14.0 version where tool identifies the approximate split keys by collecting the index key values from data table sample(or full table) where the sampling rate can be changed from 0 to 100%. • The tool builds Equi-depth histogram to identify the approximate index region boundaries.
  • 48. 48 © Hortonworks Inc. 2011–2018. All rights reserved Local Index performance Improvements • As of now when we scan through local index we need to touch base all the regions irrespective of filters. • In the current version we can filter the regions to scan for local indexes based on data table leading primary key filter conditions.
  • 49. 49 © Hortonworks Inc. 2011–2018. All rights reserved Kafka Integration
  • 50. 50 © Hortonworks Inc. 2011–2018. All rights reserved Apache Kafka Integration • Apache Kafka plugin helps to reliably and efficiently stream large amounts of data onto HBase using Phoenix API. As part of this plugin we are providing PhoenixConsumer to receive messages from KafkaProducer.
  • 51. 51 © Hortonworks Inc. 2011–2018. All rights reserved Major Features
  • 52. 52 © Hortonworks Inc. 2011–2018. All rights reserved Support for GRANT and REVOKE commands • Uses HBase ACLs only, Same Permissions are R - Read, W - Write, X - Execute, C - Create and A - Admin. • Access needs to be given on data table which automatically get propagated to all its indexes and views • No need to have WRITE access on SYSTEM.CATALOG table now for DDLs
  • 53. 53 © Hortonworks Inc. 2011–2018. All rights reserved Query log • Enable centralized recall and analyzation of queries executed against Phoenix. • We have around 18+ metrics logged in the table • Logging is done Asynchronously and can be controlled through log level(TRACE,DEBUG,INFO,OFF)
  • 54. 54 © Hortonworks Inc. 2011–2018. All rights reserved Python driver for Phoenix Query Server • It implements the Python DB 2.0 API to access Phoenix via the Phoenix Query Server • Example
  • 55. 55 © Hortonworks Inc. 2011–2018. All rights reserved Persistent Subquery Results Cache - WIP • Persisting expensive subqueries data beyond life of query helps in significant performance gain when the same subquery repeated in multiple queries. • Make use of coprocessor based server cache to avoid caching at client. • WIP at https://issues.apache.org/jira/browse/PHOENIX-4666
  • 56. 56 © Hortonworks Inc. 2011–2018. All rights reserved Upgrades/New • Apache Omid Integration for Transactions Support( along with Tephra) • Hive 3.0 • Hadoop 3.0 • Spark 2.3.0
  • 57. 57 © Hortonworks Inc. 2011–2018. All rights reserved Questions?
  • 58. 58 © Hortonworks Inc. 2011–2018. All rights reserved Thank you

Editor's Notes

  1. it may be possible that you will use less physical resources on latest releases than the old ones because of performance benefits.
  2. from personal opinion, always better to use last release with a patch no., as it is expected to be more stable, if you are not in hurry of getting new features
  3. Dependency Compatibility: An upgrade of HBase will not require an incompatible upgrade of a dependent project, including the Java runtime. Operational Compatibility: Metric changes, Behavioral changes of services, JMX APIs exposed via the /jmx/ endpoint Client Binary compatibility: Client code written to APIs doesnot recompilation
  4. Rolling upgrade you can’t run DDL with old client
  5. Each thread does 100 row multigets 1M times
  6. ByteBuffer classes are extended to optimize the storage of offset and length NIO direct byte buffers are performed better than netty byte buffer in JMH comparison Cell class is extended with new methods for accessing ByteBuffer from offheap directly without copying , these methods are kept private and has no change from public interface for the cell. Some decision like using abstract class instead of interface for ByteBufferCell was decided on performance benchmarks because of Java L2 compiler inlining which was happening with interface but happening with abstract class. Graph was bad because of GC
  7. To optimize java socket implementation during reading RPC requests where the data is temp copy from onheap to offheap buffer before copying to destination on heap buffer, with offheap buffer pools we can skip this temp copy and read RPC requests directly in offheap buffer. MSLAB pooling allocates fixed size byte buffer similar to traditional MSLAB pooling of onheap byte array to avoid fragmentation of memory. We create a pull of fixed size buffers , and cells are stored in pipeline of these buffers ,
  8. MS CSLM is concurrentSkipListMap (it is a memstore data structure). To avoid heap usage by CSLM metadata, New Data structure is used to keep the cells in flattened array.
  9. Purpose of inmemory flushes is to have data stay more in memory for faster retrievel. Data structure is optimized to reduce index overhead of CSLM
  10. ComplimentsTephra ACID properties are Hard to scale when databases have to deal with very large amounts of data and thousands of concurrent users because the data must be partitioned, distributed and replicated. OMID provides lock-free transactional support on top of Hbase with snapshot isolation guarantees.