SlideShare a Scribd company logo
1 of 69
TRANSACTIONS OVER HBASE
Alex Baranau @abaranau	

Gary Helmling @gario	

!
Continuuity
WHO WE ARE
• We’ve built Continuuity Reactor: the world’s first scale-out
application server for Hadoop
• Fast, easy development, deployment and management of
Hadoop and HBase apps
• Continuuity team has years of experience in using and contributing
to Open Source, and we intend to continue doing so.
AGENDA
• Transactions over HBase: Why? What?
• Implementation: How?
• The approach
• Transaction Manager
• Transaction in batch jobs
THE REACTOR
• Continuuity Reactor is an app platform built on Hadoop and HBase
• Collect, Process, Store, and Query data.
• A Flow is a real-time processor with exactly-once guarantee
• A flow is composed of flowlets, connected via queues
• All processing happens with ACID guarantees in transactions
HBase
Table
PROCESSING IN A FLOW
...Queue ...
...
Flowlet
... ...
HBase
Table
PROCESSING IN A FLOW
...Queue ...
...
Flowlet
... ...
HBase
Table
WRAP IN TRANSACTION!
...Queue ...
...
Flowlet
TRANSACTIONS: WHAT?
• Atomic - Entire transaction is committed as one
• Consistent - No partial state change due to failure
• Isolated - No dirty reads, transaction is only visible
after commit
• Durable - Once committed, data is persisted reliably
HBASE: QUICK “WHAT?”
• Open-source non-relational distributed column-
oriented database modeled after Google’s BigTable
• Data is stored in named Tables of Rows
• Row consists of Cells:

(row key, column family, column, timestamp) -> value
• Table is split into Regions, each served by a single node
in HBase cluster
WHAT ABOUT HBASE?
• Atomic operations on cell value: 

checkAndPut, checkAndDelete, increment, append
• Atomic batch of operations on rows within region
• No cross region atomic operations support
• No cross table atomic operations support
• No multi-RPC atomic operations support
TRANSACTIONS: WHY ELSE?
• Complex data access patterns
• Secondary Indexes
• Design for greater performance
• Efficient batching of data operations
• Client-side buffer reliability fix
IMPLEMENTATION
• Multi-Version Concurrency Control
• Cell version (timestamp) = transaction ID
• All writes in the same transaction use the transaction ID as timestamp
• Reads exclude other, uncommitted transactions (for isolation)
• Optimistic Concurrency Control
• Conflict detection at commit of transaction
• Write Conflict: two overlapping transactions write the same row
• Rollback of one transaction in case of conflict (whichever commits later)
OPTIMISTIC CONCURRENCY
CONTROL
• Avoids cost of locking rows and tables
• No deadlocks or lock escalations
• Cost of conflict detection and possible rollback
is higher
• Good if conflicts are rare: short transaction,
disjoint partitioning of work
ZooKeeper
TRANSACTIONS IN CONTEXT
Tx Manager	

(standby)
HBase
Master 1
Master 2	

RS 1
RS 2 RS 4
RS 3
Client 1
Client 2
Client N
Tx Manager	

(active)
TRANSACTION LIFE CYCLE
time
out
try abort
failed
roll back
in HBase
write
to
HBase
do work
Client Tx Manager
none
complete V
abortsucceeded
in progress
start tx
start
start tx
commit
try commit check conflicts
RPC API
invalid X
invalidate
failed
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
Client 2
write = 1002
read = 1001
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
start
write = 1002
read = 1001
Client 2
write = 1002
read = 1001
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
start
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
increment
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1
increment
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1 start
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1 start
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002, 1003]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
increment
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002, 1003]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002, 1003]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
conflict!
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002
read = 1001
Client 2
write = 1004
read = 1003
inprogress=[]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1 start
Client 2
write = 1005
read = 1003
inprogress=[]
write = 1004
read = 1003
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
read
Client 2
write = 1005
read = 1003
inprogress=[]
write = 1004
read = 1003
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
conflict!
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback failed
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
invalidate
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
invalidate
write = 1002
read = 1001
Client 2
write = 1004
read = 1003
inprogress=[]
invalid=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 start
Client 2
write = 1005
read = 1003
inprogress=[]
invalid=[1002]
write = 1004
read = 1003
exclude = [1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
read
Client 2
write = 1005
read = 1003
inprogress=[]
invalid=[1002]
write = 1004
read = 1003
exclude = [1002]
invisible!
TRANSACTION MANAGER
• Create new transactions
• Provides monotonically increasing write pointers
• Maintains all in-progress, committed, and invalid transactions
• Detect conflicts
• Transaction =
	 	 	 Write Pointer: Timestamp for HBase writes
	 	 	 Read pointer: Upper bound timestamp for reads
	 	 	 Excludes: List of timestamps to exclude from reads
TRANSACTION MANAGER
• Simple & Fast
• All required state is in-memory
• Single point of failure?
• Persist all state to a write-ahead log
• Secondary Tx Manager watches for failure of Primary
• Failover can happen quickly
TRANSACTION MANAGER
Tx Manager
Current State
in progress
committed
invalid
read point
write point
start()
TRANSACTION MANAGER
Tx Manager
Current State
in progress (+)
committed
invalid
read point
write point ++
start()
Tx Log
started, <write pt>
HDFS
TRANSACTION MANAGER
Tx Manager
Current State
in progress (-)
committed (+)
invalid
read point
write point
commit()
Tx Log
start, <write pt>
commit, <write pt>
HDFS
TRANSACTION SNAPSHOTS
• Write-ahead log provides persistence
• Guarantees point-in-time recovery
• Longer the log grows, longer recovery takes
• Periodically write snapshot of full transaction state
• Snapshot + all new logs provides full state
Tx Manager
Current State
TRANSACTION SNAPSHOTS
Tx Log A
in progress
committed
invalid
read point
write point
HDFS
Tx Manager
Current State
TRANSACTION SNAPSHOTS
Tx Log A
in progress
committed
invalid
read point
write point Tx Log B1
HDFS
TRANSACTION SNAPSHOTS
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B2
HDFS
TRANSACTION SNAPSHOTS
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B
Tx Snapshot
in progress
committed
invalid
read point
write point
3
HDFS
TRANSACTION SNAPSHOTS
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B
Tx Snapshot
in progress
committed
invalid
read point
write point
4
HDFS
HBase
TRANSACTION CLEANUP
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback failed
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
TRANSACTION CLEANUP:
DATA JANITOR
• RegionObserver coprocessor
• Maintains in-memory snapshot of recent invalid &
in-progress sets
• Periodically updates from transaction snapshot in
HDFS
• Purges data from invalid transactions and older
versions on flush & compaction
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
refresh
Data Janitor	

(RegionObserver)
MemStore
preFlush()
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
Custom RegionScanner
HFile
Cell TS Value
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
HFile
Cell TS Value
Custom RegionScanner
MemStore
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
HFile
Cell TS Value
row1:col1 1004 12Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
1003 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
1003 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Custom RegionScanner
preFlush()
MemStore
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
1003 11
REDUNDANT VERSIONS CLEANUP
• Every write is new version of a cell

• Heavily updated values (e.g. counters) result
in lots of versions
• Cleanup: keep at most one version of a
cell visible to all transactions
• Plug-in cleanup into existing Data Janitor CP
TTL
• Timestamp of a value is overwritten
with tx id
• Breaks support for “native TTL”
feature in HBase
• Tx ids and timestamps cannot be
aligned: more than 1K tx/sec wanted
TTL
• Encode timestamp in tx id
• tx id = (current ts) * 1,000,000 + counter-
within-ms
• 1B txs/sec max
• long value overflow in ~200 years
• Plug-in cleanup into existing Data Janitor CP
• Apply TTL logic on reads
TRANSACTIONS IN BATCH
• Batch jobs are long running
• No changes should be visible before job is done
• Batch jobs are multi-step
• Individual step can be restarted
• All steps succeed or all fail semantics needed
• Every step retry should be isolated
• Uncommitted changes should be visible within same
step but not to other steps
TRANSACTIONS IN BATCH
• Batch jobs are long-running
• No fine-grained conflict detection: last started
among successfully committed wins
• Delayed rollback execution
• Batch jobs are multi-step
• For now: batch job uses single transaction, every
step re-uses it
• For future: nested transactions for steps
TX OVERHEAD
• 2 RPC calls, each faster than Put in HBase
• Can be improved with batching
• To avoid overhead of transactions where
appropriate the ACID guarantees can be relaxed
• No need to change data format
• When reading: read latest version of a cell
• When writing: use ts = (current ts) * 1M + counter
WHAT’S NEXT?
• Open Source
• Continue Scaling Tx Manager
• Transaction Groups?
• Integration across other transactional stores
QS?
Looking for the chance to work with a team that is defining
a new category within Big Data?
!
We are hiring!
http://continuuity.com/careers
careers[at]continuuity.com
!
!
Alex Baranau @abaranau alexb[at]continuuity.com

More Related Content

Similar to Transactions Over Apache HBase

HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon
 
LeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityLeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityRicardo Jimenez-Peris
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementRicardo Jimenez-Peris
 
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docxRunning Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docxjoellemurphey
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Dbms ii mca-ch9-transaction-processing-2013
Dbms ii mca-ch9-transaction-processing-2013Dbms ii mca-ch9-transaction-processing-2013
Dbms ii mca-ch9-transaction-processing-2013Prosanta Ghosh
 
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Ishan Thakkar
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
 
Multi-service reactive streams using Spring, Reactor, RSocket
Multi-service reactive streams using Spring, Reactor, RSocketMulti-service reactive streams using Spring, Reactor, RSocket
Multi-service reactive streams using Spring, Reactor, RSocketStéphane Maldini
 
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Eran Harel
 
Databases: Concurrency Control
Databases: Concurrency ControlDatabases: Concurrency Control
Databases: Concurrency ControlDamian T. Gordon
 
transactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptxtransactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptxssusereced02
 
Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0StreamNative
 
5 application serversforproject
5 application serversforproject5 application serversforproject
5 application serversforprojectashish61_scs
 
VMworld 2013: Extreme Performance Series: vCenter of the Universe
VMworld 2013: Extreme Performance Series: vCenter of the UniverseVMworld 2013: Extreme Performance Series: vCenter of the Universe
VMworld 2013: Extreme Performance Series: vCenter of the UniverseVMworld
 

Similar to Transactions Over Apache HBase (20)

HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
LeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityLeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo University
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional Management
 
Transaction
TransactionTransaction
Transaction
 
basil.pptx
basil.pptxbasil.pptx
basil.pptx
 
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docxRunning Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
 
Consul scale
Consul scaleConsul scale
Consul scale
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Dbms ii mca-ch9-transaction-processing-2013
Dbms ii mca-ch9-transaction-processing-2013Dbms ii mca-ch9-transaction-processing-2013
Dbms ii mca-ch9-transaction-processing-2013
 
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Multi-service reactive streams using Spring, Reactor, RSocket
Multi-service reactive streams using Spring, Reactor, RSocketMulti-service reactive streams using Spring, Reactor, RSocket
Multi-service reactive streams using Spring, Reactor, RSocket
 
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)
 
Databases: Concurrency Control
Databases: Concurrency ControlDatabases: Concurrency Control
Databases: Concurrency Control
 
transactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptxtransactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptx
 
19.TRANSACTIONs.ppt
19.TRANSACTIONs.ppt19.TRANSACTIONs.ppt
19.TRANSACTIONs.ppt
 
Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0
 
5 application serversforproject
5 application serversforproject5 application serversforproject
5 application serversforproject
 
Data (1)
Data (1)Data (1)
Data (1)
 
VMworld 2013: Extreme Performance Series: vCenter of the Universe
VMworld 2013: Extreme Performance Series: vCenter of the UniverseVMworld 2013: Extreme Performance Series: vCenter of the Universe
VMworld 2013: Extreme Performance Series: vCenter of the Universe
 

Recently uploaded

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksMagic Marks
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 

Recently uploaded (20)

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 

Transactions Over Apache HBase

  • 1. TRANSACTIONS OVER HBASE Alex Baranau @abaranau Gary Helmling @gario ! Continuuity
  • 2. WHO WE ARE • We’ve built Continuuity Reactor: the world’s first scale-out application server for Hadoop • Fast, easy development, deployment and management of Hadoop and HBase apps • Continuuity team has years of experience in using and contributing to Open Source, and we intend to continue doing so.
  • 3. AGENDA • Transactions over HBase: Why? What? • Implementation: How? • The approach • Transaction Manager • Transaction in batch jobs
  • 4. THE REACTOR • Continuuity Reactor is an app platform built on Hadoop and HBase • Collect, Process, Store, and Query data. • A Flow is a real-time processor with exactly-once guarantee • A flow is composed of flowlets, connected via queues • All processing happens with ACID guarantees in transactions
  • 5. HBase Table PROCESSING IN A FLOW ...Queue ... ... Flowlet ... ...
  • 6. HBase Table PROCESSING IN A FLOW ...Queue ... ... Flowlet ... ...
  • 8. TRANSACTIONS: WHAT? • Atomic - Entire transaction is committed as one • Consistent - No partial state change due to failure • Isolated - No dirty reads, transaction is only visible after commit • Durable - Once committed, data is persisted reliably
  • 9. HBASE: QUICK “WHAT?” • Open-source non-relational distributed column- oriented database modeled after Google’s BigTable • Data is stored in named Tables of Rows • Row consists of Cells:
 (row key, column family, column, timestamp) -> value • Table is split into Regions, each served by a single node in HBase cluster
  • 10. WHAT ABOUT HBASE? • Atomic operations on cell value: 
 checkAndPut, checkAndDelete, increment, append • Atomic batch of operations on rows within region • No cross region atomic operations support • No cross table atomic operations support • No multi-RPC atomic operations support
  • 11. TRANSACTIONS: WHY ELSE? • Complex data access patterns • Secondary Indexes • Design for greater performance • Efficient batching of data operations • Client-side buffer reliability fix
  • 12. IMPLEMENTATION • Multi-Version Concurrency Control • Cell version (timestamp) = transaction ID • All writes in the same transaction use the transaction ID as timestamp • Reads exclude other, uncommitted transactions (for isolation) • Optimistic Concurrency Control • Conflict detection at commit of transaction • Write Conflict: two overlapping transactions write the same row • Rollback of one transaction in case of conflict (whichever commits later)
  • 13. OPTIMISTIC CONCURRENCY CONTROL • Avoids cost of locking rows and tables • No deadlocks or lock escalations • Cost of conflict detection and possible rollback is higher • Good if conflicts are rare: short transaction, disjoint partitioning of work
  • 14. ZooKeeper TRANSACTIONS IN CONTEXT Tx Manager (standby) HBase Master 1 Master 2 RS 1 RS 2 RS 4 RS 3 Client 1 Client 2 Client N Tx Manager (active)
  • 15. TRANSACTION LIFE CYCLE time out try abort failed roll back in HBase write to HBase do work Client Tx Manager none complete V abortsucceeded in progress start tx start start tx commit try commit check conflicts RPC API invalid X invalidate failed
  • 16. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 Tx Manager Client 1 Client 2 write = 1002 read = 1001
  • 17. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1002 read = 1001
  • 18. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002]
  • 19. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 Tx Manager Client 1 increment write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002]
  • 20. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 Tx Manager Client 1 increment write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002]
  • 21. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002] write = 1003 read = 1001 excluded=[1002]
  • 22. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002, 1003] write = 1003 read = 1001 excluded=[1002]
  • 23. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 increment write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002, 1003] write = 1003 read = 1001 excluded=[1002]
  • 24. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 commit write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002, 1003] write = 1003 read = 1001 excluded=[1002]
  • 25. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 commit write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002] write = 1003 read = 1001 excluded=[1002]
  • 26. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 commit write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 27. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 conflict! write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 28. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 29. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 rollback write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 30. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 abort write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 31. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 abort write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[]
  • 32. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 abort write = 1002 read = 1001 Client 2 write = 1004 read = 1003 inprogress=[]
  • 33. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 start Client 2 write = 1005 read = 1003 inprogress=[] write = 1004 read = 1003
  • 34. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 read Client 2 write = 1005 read = 1003 inprogress=[] write = 1004 read = 1003
  • 35. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 conflict! write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 36. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 37. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback failed write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 38. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 invalidate write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 39. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 invalidate write = 1002 read = 1001 Client 2 write = 1004 read = 1003 inprogress=[] invalid=[1002]
  • 40. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 start Client 2 write = 1005 read = 1003 inprogress=[] invalid=[1002] write = 1004 read = 1003 exclude = [1002]
  • 41. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 read Client 2 write = 1005 read = 1003 inprogress=[] invalid=[1002] write = 1004 read = 1003 exclude = [1002] invisible!
  • 42. TRANSACTION MANAGER • Create new transactions • Provides monotonically increasing write pointers • Maintains all in-progress, committed, and invalid transactions • Detect conflicts • Transaction = Write Pointer: Timestamp for HBase writes Read pointer: Upper bound timestamp for reads Excludes: List of timestamps to exclude from reads
  • 43. TRANSACTION MANAGER • Simple & Fast • All required state is in-memory • Single point of failure? • Persist all state to a write-ahead log • Secondary Tx Manager watches for failure of Primary • Failover can happen quickly
  • 44. TRANSACTION MANAGER Tx Manager Current State in progress committed invalid read point write point start()
  • 45. TRANSACTION MANAGER Tx Manager Current State in progress (+) committed invalid read point write point ++ start() Tx Log started, <write pt> HDFS
  • 46. TRANSACTION MANAGER Tx Manager Current State in progress (-) committed (+) invalid read point write point commit() Tx Log start, <write pt> commit, <write pt> HDFS
  • 47. TRANSACTION SNAPSHOTS • Write-ahead log provides persistence • Guarantees point-in-time recovery • Longer the log grows, longer recovery takes • Periodically write snapshot of full transaction state • Snapshot + all new logs provides full state
  • 48. Tx Manager Current State TRANSACTION SNAPSHOTS Tx Log A in progress committed invalid read point write point HDFS
  • 49. Tx Manager Current State TRANSACTION SNAPSHOTS Tx Log A in progress committed invalid read point write point Tx Log B1 HDFS
  • 50. TRANSACTION SNAPSHOTS Tx Log ATx Manager in progress committed invalid read point write point Current State State Snapshot in progress committed invalid read point write point Tx Log B2 HDFS
  • 51. TRANSACTION SNAPSHOTS Tx Log ATx Manager in progress committed invalid read point write point Current State State Snapshot in progress committed invalid read point write point Tx Log B Tx Snapshot in progress committed invalid read point write point 3 HDFS
  • 52. TRANSACTION SNAPSHOTS Tx Log ATx Manager in progress committed invalid read point write point Current State State Snapshot in progress committed invalid read point write point Tx Log B Tx Snapshot in progress committed invalid read point write point 4 HDFS
  • 53. HBase TRANSACTION CLEANUP Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback failed write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 54. TRANSACTION CLEANUP: DATA JANITOR • RegionObserver coprocessor • Maintains in-memory snapshot of recent invalid & in-progress sets • Periodically updates from transaction snapshot in HDFS • Purges data from invalid transactions and older versions on flush & compaction
  • 55. HBase TRANSACTION CLEANUP: DATA JANITOR Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] refresh Data Janitor (RegionObserver) MemStore preFlush() read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Cell TS Value row1:col1 1004 12 1003 11 1002 11 Custom RegionScanner HFile Cell TS Value
  • 56. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) HFile Cell TS Value Custom RegionScanner MemStore read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11
  • 57. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) HFile Cell TS Value row1:col1 1004 12Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11
  • 58. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12
  • 59. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12 1003 11
  • 60. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12 1003 11
  • 61. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Custom RegionScanner preFlush() MemStore Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12 1003 11
  • 62. REDUNDANT VERSIONS CLEANUP • Every write is new version of a cell • Heavily updated values (e.g. counters) result in lots of versions • Cleanup: keep at most one version of a cell visible to all transactions • Plug-in cleanup into existing Data Janitor CP
  • 63. TTL • Timestamp of a value is overwritten with tx id • Breaks support for “native TTL” feature in HBase • Tx ids and timestamps cannot be aligned: more than 1K tx/sec wanted
  • 64. TTL • Encode timestamp in tx id • tx id = (current ts) * 1,000,000 + counter- within-ms • 1B txs/sec max • long value overflow in ~200 years • Plug-in cleanup into existing Data Janitor CP • Apply TTL logic on reads
  • 65. TRANSACTIONS IN BATCH • Batch jobs are long running • No changes should be visible before job is done • Batch jobs are multi-step • Individual step can be restarted • All steps succeed or all fail semantics needed • Every step retry should be isolated • Uncommitted changes should be visible within same step but not to other steps
  • 66. TRANSACTIONS IN BATCH • Batch jobs are long-running • No fine-grained conflict detection: last started among successfully committed wins • Delayed rollback execution • Batch jobs are multi-step • For now: batch job uses single transaction, every step re-uses it • For future: nested transactions for steps
  • 67. TX OVERHEAD • 2 RPC calls, each faster than Put in HBase • Can be improved with batching • To avoid overhead of transactions where appropriate the ACID guarantees can be relaxed • No need to change data format • When reading: read latest version of a cell • When writing: use ts = (current ts) * 1M + counter
  • 68. WHAT’S NEXT? • Open Source • Continue Scaling Tx Manager • Transaction Groups? • Integration across other transactional stores
  • 69. QS? Looking for the chance to work with a team that is defining a new category within Big Data? ! We are hiring! http://continuuity.com/careers careers[at]continuuity.com ! ! Alex Baranau @abaranau alexb[at]continuuity.com