The Lily RowLog library

•

3 j'aime•1,382 vues

NGDATA

Presentation on the Lily RowLog library as presented to the HBase/Hadoop meetup on the eve of Hadoop World 2011

Technologie Business

Lily
A SMART DATA PLATFORM
MAKING BIG DATA APPS EASY

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

the (lily)
rowlog
library
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Lily Architecture
(components)

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3

Lily Architecture

?
(components)

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4

Lily 101
Mo
» data repository on top of HBase r
Ha e inf
do
Tue op o?
W
Me sday orl
» records with ﬁelds tB
alr 1:15P
d
oo M
m
» rich data types + schema
» versioning
» Java + REST api
» indexes into Solr (et al)
» a bunch more: smart data at scale, made easy
» Apache license - www.lilyproject.org

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5

use of rowlog inside lily

» feed Solr index with (Lily|HBase) record updates
» maintain secondary indices (i.e. linkindex)
» shared concerns:
» reliability
» consistency
» manageability
» (scalability)

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6

UC1: message queue (mq)

record update Indexer update Solr index entry

possible failure

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7

UC1: message queue (mq)

record Indexer update Solr index entry

update

? MQ

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8

UC1: message queue (mq)

Indexer
Indexer
record Indexer update Solr index entry

update

MQ

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9

MQ requirements

» async (cope with Solr ‘lag’)
» guaranteed execution
» no concurrent processing of 2 msg about the same record
» no extra tech (HBase should be good enough)
» management complexity
» beneﬁts from scalability, resilience, etc

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10

UC2: write-ahead-log (WAL)
» secondary actions
» pushing messages onto MQ (!)
» updating secondary indices (i.e. linkindex)
» requirements
» sec. actions eventually get executed, in predeﬁned order
» further updates to record denied until sec. actions succeeded
» synchronous
» pre-update: check WAL for outstanding actions + cleanup
mechanism
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11

the rowlog library

VM listener
subscription
listener
subscription
RowLog RowLog
subscription

subscription

Netty

global row-local listener

queue storage (HBase)

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12

global queue

» separate HBase table
» 1 msg per record update per subscription
» key = (shard id +) subscription ID + timestamp + (data
table) rowkey + sequence nr
» rowlog processor (single instance, managed by ZK)
» data always appended/deleted from table end (boo!)

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13

row-local queue

RECORDS table (HBASE) Row-locaL queue DATA

ROW 1

ROW 2

ROW 3

ROW 4

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14

row-local queue
CF1 CF2

data payload execution state

1 2 1 2

ROW X
payload payload
data data

ROW Y

ROW Z
message ID

consumer id state

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15

why row-local queue?
» predates Inbox-concept (Google Megastore)
» msgs will appear on rowlog if and only if updates have
really happened
» rely on atomic row operation guarantee of HBase
» msgs on global queue without local counterparts can be discarded
» ‘msgs’ on global rowlog can be small
» just point to msgs in row-local queue
» actual payload sits there
» optimized processing of msgs per row (i.e. combine)

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16

rowlog sharding

» MQ and WAL tables tend to be smallish
» MQ depends on performance of Solr indexing
» WAL size = number of simultaneous operations
» risk for contention (all data in one region)
➡ introduction of RowLog sharding (Lily 1.1)

➡ continuous puts/deletes on HBase table = not very
efﬁcient ➙ long-term need to replace this

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17

last words

» RowLog library can be used independent from Lily (!)
» part of the Lily source tree
» Apache license

» www.lilyproject.org
» shameless plug: go and check out Lily, HBase+Solr-
backed repository for content-centric apps

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18

Thank you !
for your attention
for your questions

» stevenn@outerthought.org

» @stevenn

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Recommandé

Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreHsien-Hsin Sean Lee, Ph.D.

20170504 - Warp 10 Tour, 42 USAMathias Herberts

Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.

8051 micro controllerPoojith Chowdhary

Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...Hsien-Hsin Sean Lee, Ph.D.

Winter training,Readymade Projects,Buy Projects,Corporate TrainingTechnogroovy

Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Hsien-Hsin Sean Lee, Ph.D.

Lec2 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ILPHsien-Hsin Sean Lee, Ph.D.

Recommandé

Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreHsien-Hsin Sean Lee, Ph.D.

20170504 - Warp 10 Tour, 42 USAMathias Herberts

Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.

8051 micro controllerPoojith Chowdhary

Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...Hsien-Hsin Sean Lee, Ph.D.

Winter training,Readymade Projects,Buy Projects,Corporate TrainingTechnogroovy

Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Hsien-Hsin Sean Lee, Ph.D.

Lec2 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ILPHsien-Hsin Sean Lee, Ph.D.

Lily for the Bay Area HBase UG - NYC editionNGDATA

Hadoop World 2011: Lily: Smart Data at Scale, Made EasyCloudera, Inc.

KVIV / NoSQL : the new generation of database serversNGDATA

N-O-SQL, new database technologies on the riseNGDATA

Welcome to the Age of DataNGDATA

Building a CMS on top of NoSQL (for ParisJUG)NGDATA

Lily @ Work WebinarNGDATA

Learning Lessons: Building a CMS on top of NoSQL technologiesNGDATA

Outerthought / Lily PartnershipsNGDATA

Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris

NoSQL intro for YaJUG / NoSQL UG LuxembourgNGDATA

NoSQL with Hadoop and HBaseNGDATA

Devoxx 2010 | LAB : ReST in JavaNGDATA

Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA

DRP for Big Data - Stream Processing ArchitecturesMohamed Mehdi Ben Aissa

Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore

Locality of (p)referenceFromDual GmbH

Concepts and Patterns for Streaming Services with KafkaQAware GmbH

Dissecting Open Source Cloud Evolution: An OpenStack Case StudySalman Baset

PrismTech Reflective Language for Communication SystemsADLINK Technology IoT

NGDATA Corporate PresentationNGDATA

From Content Storage to Scaling Smart DataNGDATA

Contenu connexe

Similaire à The Lily RowLog library

Lily for the Bay Area HBase UG - NYC editionNGDATA

Hadoop World 2011: Lily: Smart Data at Scale, Made EasyCloudera, Inc.

KVIV / NoSQL : the new generation of database serversNGDATA

N-O-SQL, new database technologies on the riseNGDATA

Welcome to the Age of DataNGDATA

Building a CMS on top of NoSQL (for ParisJUG)NGDATA

Lily @ Work WebinarNGDATA

Learning Lessons: Building a CMS on top of NoSQL technologiesNGDATA

Outerthought / Lily PartnershipsNGDATA

Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris

NoSQL intro for YaJUG / NoSQL UG LuxembourgNGDATA

NoSQL with Hadoop and HBaseNGDATA

Devoxx 2010 | LAB : ReST in JavaNGDATA

Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA

DRP for Big Data - Stream Processing ArchitecturesMohamed Mehdi Ben Aissa

Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore

Locality of (p)referenceFromDual GmbH

Concepts and Patterns for Streaming Services with KafkaQAware GmbH

Dissecting Open Source Cloud Evolution: An OpenStack Case StudySalman Baset

PrismTech Reflective Language for Communication SystemsADLINK Technology IoT

Similaire à The Lily RowLog library (20)

Lily for the Bay Area HBase UG - NYC edition

Hadoop World 2011: Lily: Smart Data at Scale, Made Easy

KVIV / NoSQL : the new generation of database servers

N-O-SQL, new database technologies on the rise

Welcome to the Age of Data

Building a CMS on top of NoSQL (for ParisJUG)

Lily @ Work Webinar

Learning Lessons: Building a CMS on top of NoSQL technologies

Outerthought / Lily Partnerships

Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...

NoSQL intro for YaJUG / NoSQL UG Luxembourg

NoSQL with Hadoop and HBase

Devoxx 2010 | LAB : ReST in Java

Devoxx 2010 | Tools In Action : Kauri and Lily

DRP for Big Data - Stream Processing Architectures

Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data

Locality of (p)reference

Concepts and Patterns for Streaming Services with Kafka

Dissecting Open Source Cloud Evolution: An OpenStack Case Study

PrismTech Reflective Language for Communication Systems

Plus de NGDATA

NGDATA Corporate PresentationNGDATA

From Content Storage to Scaling Smart DataNGDATA

20110514 appsforghentNGDATA

Big DataNGDATA

Lily at HUG UKNGDATA

Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA

NoSQL BOF at DevoxxNGDATA

NoSQL "Tools in Action" talk at DevoxxNGDATA

Plus de NGDATA (8)

NGDATA Corporate Presentation

From Content Storage to Scaling Smart Data

20110514 appsforghent

Big Data

Lily at HUG UK

Devoxx 2010 | Tools In Action : Kauri and Lily

NoSQL BOF at Devoxx

NoSQL "Tools in Action" talk at Devoxx

Dernier

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

"ML in Production",Oleksandr BaganFwdays

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Install Stable Diffusion in windows machinePadma Pradeep

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

AI as an Interface for Commercial BuildingsMemoori

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Dernier (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

"ML in Production",Oleksandr Bagan

SAP Build Work Zone - Overview L2-L3.pptx

Scanning the Internet for External Cloud Exposures via SSL Certs

Unraveling Multimodality with Large Language Models.pdf

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

My Hashitalk Indonesia April 2024 Presentation

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

"Debugging python applications inside k8s environment", Andrii Soldatenko

Powerpoint exploring the locations used in television show Time Clash

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Gen AI in Business - Global Trends Report 2024.pdf

Install Stable Diffusion in windows machine

What's New in Teams Calling, Meetings and Devices March 2024

AI as an Interface for Commercial Buildings

Unleash Your Potential - Namagunga Girls Coding Club

DevEX - reference for building teams, processes, and platforms

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Anypoint Exchange: It’s Not Just a Repo!

WordPress Websites for Engineers: Elevate Your Brand

The Lily RowLog library

1. Lily A SMART DATA PLATFORM MAKING BIG DATA APPS EASY IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

2. the (lily) rowlog library IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

3. Lily Architecture (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3

4. Lily Architecture ? (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4

5. Lily 101 Mo » data repository on top of HBase r Ha e inf do Tue op o? W Me sday orl » records with ﬁelds tB alr 1:15P d oo M m » rich data types + schema » versioning » Java + REST api » indexes into Solr (et al) » a bunch more: smart data at scale, made easy » Apache license - www.lilyproject.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5

6. use of rowlog inside lily » feed Solr index with (Lily|HBase) record updates » maintain secondary indices (i.e. linkindex) » shared concerns: » reliability » consistency » manageability » (scalability) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6

7. UC1: message queue (mq) record update Indexer update Solr index entry possible failure IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7

8. UC1: message queue (mq) record Indexer update Solr index entry update ? MQ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8

9. UC1: message queue (mq) Indexer Indexer record Indexer update Solr index entry update MQ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9

10. MQ requirements » async (cope with Solr ‘lag’) » guaranteed execution » no concurrent processing of 2 msg about the same record » no extra tech (HBase should be good enough) » management complexity » beneﬁts from scalability, resilience, etc IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10

11. UC2: write-ahead-log (WAL) » secondary actions » pushing messages onto MQ (!) » updating secondary indices (i.e. linkindex) » requirements » sec. actions eventually get executed, in predeﬁned order » further updates to record denied until sec. actions succeeded » synchronous » pre-update: check WAL for outstanding actions + cleanup mechanism IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11

12. the rowlog library VM listener subscription listener subscription RowLog RowLog subscription subscription Netty global row-local listener queue storage (HBase) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12

13. global queue » separate HBase table » 1 msg per record update per subscription » key = (shard id +) subscription ID + timestamp + (data table) rowkey + sequence nr » rowlog processor (single instance, managed by ZK) » data always appended/deleted from table end (boo!) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13

14. row-local queue RECORDS table (HBASE) Row-locaL queue DATA ROW 1 ROW 2 ROW 3 ROW 4 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14

15. row-local queue CF1 CF2 data payload execution state 1 2 1 2 ROW X payload payload data data ROW Y ROW Z message ID consumer id state IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15

16. why row-local queue? » predates Inbox-concept (Google Megastore) » msgs will appear on rowlog if and only if updates have really happened » rely on atomic row operation guarantee of HBase » msgs on global queue without local counterparts can be discarded » ‘msgs’ on global rowlog can be small » just point to msgs in row-local queue » actual payload sits there » optimized processing of msgs per row (i.e. combine) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16

17. rowlog sharding » MQ and WAL tables tend to be smallish » MQ depends on performance of Solr indexing » WAL size = number of simultaneous operations » risk for contention (all data in one region) ➡ introduction of RowLog sharding (Lily 1.1) ➡ continuous puts/deletes on HBase table = not very efﬁcient ➙ long-term need to replace this IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17

18. last words » RowLog library can be used independent from Lily (!) » part of the Lily source tree » Apache license » www.lilyproject.org » shameless plug: go and check out Lily, HBase+Solr- backed repository for content-centric apps IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18

19. Thank you ! for your attention for your questions » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org