SlideShare a Scribd company logo
1 of 50
Download to read offline
Apache BookKeeper
A High Performance and Low Latency Storage Service
@sijieg (Sijie Guo, Twitter)
@jvjujjuri (JV, Salesforce)
I am Sijie Guo
- PMC Chair of Apache BookKeeper
- Co-creator of Apache DistributedLog
- Twitter Messaging/Pub-Sub Team
- Yahoo! R&D Beijing
Hello!
Challenges in Distributed Systems
Expect Failures
up to 10% annual failure rates for disks/servers
“
Symptoms
Problem 1: Not Available
Problem 1: Not Available
Problem 2: Inconsistencies
CAP
“
More Issues
Problem 3: Split Brain
Writer A Writer A
Write A’
Writer A
Write A’
Two Writers
Problem 4: Failure Detection
B
A
C
Problem 5: Recovery
B
A
C
Recovery Protocol
Consistency
“
Solutions
Overview
Enter Apache BookKeeper
BookKeeper - Durable Storage
A Durable Storage Optimized for Immutable Data
Serve as a building block for reliable systems
Commodity Hardware
Durability
Replication Consistency Recovery
Client Library
Immutable Data Abstraction
Ledger
◉ Segment
◉ Block / Object
◉ Append-Only File
◉ ...
Guarantees
If an entry
has been acknowledged,
it must be readable
If an entry
is read once,
it must always be readable
History
◉ Initial Use Case - Hadoop NameNode HA
◉ 2008: Open Sourced Contrib of ZooKeeper
◉ 2011: Sub-Project of ZooKeeper
◉ 2012: Yahoo! Push Notification
◉ 2012~Now: DistributedLog, Pulsar, Majordodo
◉ 2015~Now: Salesforce Distributed Store
Inside of Apache BookKeeper
Details
Architecture
Bookie
Bookie
Bookie
APP
Client
Metadata Store
Ledger
Reliable Writes
◉ Store checksum along with entry
◉ Fsync entries before responding
◉ Ack when
○ All Previous Entries
○ This Entry
Bookie
Bookie
Bookie
Accepted
by
Quorum
Consistency - LastAddPushed
0 1 2 3 4 7 8 9
LastAdd
Pushed
10 11 12
Writer
Add entries
Consistency - LastAddConfirmed
0 1 2 3 4 7 8 9 10 11 12
LastAdd
Confirmed
Reader Reader
LastAdd
Confirmed
Writer Writer
Ownership Changed
Add entries
Ack Adds
Fencing
Fencing
Read Entry & Read LAC
B1 B2 B3
Client
Read Entry K
Speculative Reads
On Timeouts
B1 B2 B3
Client
Read LAC
Quorum Read
Long Poll Read
B1 B2 B3
Client
Long Poll Read
Speculative
Long Poll
Inside a Bookie
Use Cases
Apache BookKeeper as a Building Block
Projects built on BookKeeper
◉ Twitter: Apache DistributedLog
◉ Yahoo: Pulsar - Cloud Messaging Service
◉ Salesforce Distributed Store.
◉ Huawei - HDFS NameNode HA
◉ HubSpot - WAL
◉ Majordodo - Distributed Resource Manager
“
Apache DistributedLog
(Twitter)
Apache DistributedLog
1 2 3 4 5 6 7 11
1
2
1
3
1
4
1
5
1
6
1
7
Oldest Newest
Log Segment
X
Log Segment
X+1
Log Segment
X+2
Apache BookKeeper
Apache DistributedLog
MetadataStore
Log Segment
Store
(BK)
Cold
Storage
(HDFS)
Log Streams - Abstraction & Naming
- Data Management
- Efficient Write & Read
- Intra-cluster & Geo Replication
- Segments
- Raw Streams
Write
Proxy
Read
Proxy
- Ownership Tracking
- Batching, Compression
Record Cache -
Rate Limiting, Quota -
- Serving
- Applications
- Different
Consumer
models
DBs - e.g.,
Twitter’s
Manhattan
Deferred
RPC
(queuing)
Self-serve
Pub/Sub
Stream
Computing
Cross DC
Replication
DistributedLog at Twitter
◉ Manhattan Key/Value Store - WAL
◉ Durable Deferred RPC - Journal
◉ Real-Time Search Indexing - Change Propagation
◉ Self-serve Pub/Sub - Message Delivery, Ads Pipeline
◉ Stream Computing
○ Source & Sink
○ Stateful Processing in Heron (coming soon)
◉ Reliable Cross Datacenter Replication
Scale DistributedLog at Twitter
◉ 1.5 trillion records/day, 17.5 petabytes/day
◉ O(10) thousands streams, O(1) million live ledgers
◉ O(10^2) bookies, O(10^3) proxies
◉ Records size from 100 bytes to 20 KB to even more
◉ Data is kept from hours to days, even up to a year
◉ Replication factor is 3 or 5. 9 or 15 for global use
case.
DistributedLog Resources
◉ Website - https://distributedlog.io
◉ Mail List -
dev@distributedlog.incubator.apache.org
◉ Project Ideas -
https://cwiki.apache.org/confluence/display/DL/Project+Ideas
◉ Paper - “DistributedLog: A high performance
replicated log service” (ICDE 2017)
“
Yahoo! Pulsar
(Cloud Messaging Service)
Yahoo! Pulsar
◉ Distributed Pub/Sub Messaging Platform
◉ Flexible Messaging Model - Topic and Queue
◉ Durable, Low Latency
◉ Strong Ordering and Consistency Guarantees
◉ Geo Replication
◉ Apache BookKeeper as Durable Message Store
Yahoo! Pulsar
Scale Pulsar at Yahoo!
◉ 100 billion messages per day
◉ More than 1.4 million topics
◉ Avg publish latency across services of less than 5ms
◉ 10+ data centers, cross-region replications
Pulsar Performance
“
Salesforce Distributed Store
Salesforce Application Storage
◉ Store for Persistent WAL, Data and Objects
◉ Low, Constant Write Latencies
◉ Low, Constant Random Read Latencies
◉ Highly Available, Consistent
◉ Distributed and Linearly Scalable
◉ On Commodity Hardware
Heterogeneous Stores
Roadmap, Releases, Future
Community
Community
◉ 7 PMC Members
◉ 10+ Committers
◉ 20+ Active Contributors
◉ 5+ Companies actively using/contributing
○ Twitter
○ Yahoo!
○ Salesforce
○ Huawei
○ EMC
Release 4.5.0
◉ Netty 4 Upgrade - Performance Improvements
◉ Security (Authentication & Authorization) Support
◉ Explicit LAC
◉ Long Poll Read Support
◉ Auto Re-replication Improvements
◉ ...
Future
◉ Scalable Segment Store
○ Object, Log, File, Stream, …
◉ Long Term Storage
○ Disk Scrubber
○ Better Lifecycle Management
○ …
◉ Beyond the limit
○ 128 bits support
○ Scalable metadata management
Any questions ?
You can find me at
◉ @sijieg
◉ guosijie@gmail.com
Thanks!

More Related Content

What's hot

HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
Daniel Bimschas
 

What's hot (20)

kafka
kafkakafka
kafka
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
 

Similar to Apache BookKeeper: A High Performance and Low Latency Storage Service

Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 

Similar to Apache BookKeeper: A High Performance and Low Latency Storage Service (20)

Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Ceph - High Performance Without High Costs
Ceph - High Performance Without High CostsCeph - High Performance Without High Costs
Ceph - High Performance Without High Costs
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
 
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
System to generate speech to text in real time
System to generate speech to text in real timeSystem to generate speech to text in real time
System to generate speech to text in real time
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Luxun a Persistent Messaging System Tailored for Big Data Collecting & Analytics
Luxun a Persistent Messaging System Tailored for Big Data Collecting & AnalyticsLuxun a Persistent Messaging System Tailored for Big Data Collecting & Analytics
Luxun a Persistent Messaging System Tailored for Big Data Collecting & Analytics
 
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
 
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembBuilding a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
 
Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public Cloud
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Apache BookKeeper: A High Performance and Low Latency Storage Service