5. Lily 101
Mo
» data repository on top of HBase r
Ha e inf
do
Tue op o?
W
Me sday orl
» records with fields tB
alr 1:15P
d
oo M
m
» rich data types + schema
» versioning
» Java + REST api
» indexes into Solr (et al)
» a bunch more: smart data at scale, made easy
» Apache license - www.lilyproject.org
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
6. use of rowlog inside lily
» feed Solr index with (Lily|HBase) record updates
» maintain secondary indices (i.e. linkindex)
» shared concerns:
» reliability
» consistency
» manageability
» (scalability)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
7. UC1: message queue (mq)
record update Indexer update Solr index entry
possible failure
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
10. MQ requirements
» async (cope with Solr ‘lag’)
» guaranteed execution
» no concurrent processing of 2 msg about the same record
» no extra tech (HBase should be good enough)
» management complexity
» benefits from scalability, resilience, etc
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
11. UC2: write-ahead-log (WAL)
» secondary actions
» pushing messages onto MQ (!)
» updating secondary indices (i.e. linkindex)
» requirements
» sec. actions eventually get executed, in predefined order
» further updates to record denied until sec. actions succeeded
» synchronous
» pre-update: check WAL for outstanding actions + cleanup
mechanism
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
13. global queue
» separate HBase table
» 1 msg per record update per subscription
» key = (shard id +) subscription ID + timestamp + (data
table) rowkey + sequence nr
» rowlog processor (single instance, managed by ZK)
» data always appended/deleted from table end (boo!)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
15. row-local queue
CF1 CF2
data payload execution state
1 2 1 2
ROW X
payload payload
data data
ROW Y
ROW Z
message ID
consumer id state
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
16. why row-local queue?
» predates Inbox-concept (Google Megastore)
» msgs will appear on rowlog if and only if updates have
really happened
» rely on atomic row operation guarantee of HBase
» msgs on global queue without local counterparts can be discarded
» ‘msgs’ on global rowlog can be small
» just point to msgs in row-local queue
» actual payload sits there
» optimized processing of msgs per row (i.e. combine)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
17. rowlog sharding
» MQ and WAL tables tend to be smallish
» MQ depends on performance of Solr indexing
» WAL size = number of simultaneous operations
» risk for contention (all data in one region)
➡ introduction of RowLog sharding (Lily 1.1)
➡ continuous puts/deletes on HBase table = not very
efficient ➙ long-term need to replace this
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
18. last words
» RowLog library can be used independent from Lily (!)
» part of the Lily source tree
» Apache license
» www.lilyproject.org
» shameless plug: go and check out Lily, HBase+Solr-
backed repository for content-centric apps
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
19. Thank you !
for your attention
for your questions
» stevenn@outerthought.org
» @stevenn
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org