SlideShare une entreprise Scribd logo
1  sur  14
Pulsar Virtual Summit North America 2021
Apache BookKeeper State Store:
A Durable Key-Value Store
Pulsar Virtual Summit North America 2021
Prashant Kumar
Principal Software Engineer @ Splunk
● Principal Software Developer at Splunk.
● Ex Yahoo, Verizon Media.
● Prior Experience, Key member and
contributor to Sherpa, the geographically
replicated multi tenant key value store @
Yahoo.
Pulsar Virtual Summit North America 2021
Agenda
I. Introduction to Apache Bookkeeper Statestore.
II. Why yet another KV store?
III. How does it fit into Pulsar ecosystem?
IV. Intended use case.
V. Brief architecture.
VI. Current state and production worthiness.
VII. Product roadmap and future work.
Pulsar Virtual Summit North America 2021
Apache Bookkeeper Statestore
● It’s a Key-Value store.
● It’s durable.
● It’s locally replicated.
● It’s eventually consistent.
● It’s fault tolerant.
● It’s cloud native and k8s based deployment.
Pulsar Virtual Summit North America 2021
Argh!. Yet another KV store?
Pulsar Virtual Summit North America 2021
Integral part of Apache Pulsar ecosystem
● Uses same Zookeeper deployment that Pulsar uses.
● Uses same Bookkeeper deployment that Pulsar uses.
● Uses same infrastructure for metrics, dashboards etc as
Bookkeeper
● Part of bookkeeper/stream code base.
● Existing client side integration in Apache Pulsar function
service.
Pulsar Virtual Summit North America 2021
Primary use cases
● Store and access function state and checkpoints
● A secondary metadata store for Apache Pulsar, away from
Zookeeper
● Other various KV store use cases
Pulsar Virtual Summit North America 2021
Data model
Pulsar Virtual Summit North America 2021
High level serving architecture (Bird view)
Pulsar Virtual Summit North America 2021
High level Datastore architecture
Pulsar Virtual Summit North America 2021
Benchmarking
● Benchmarking with YCSB
● Setup
○ YCSB Thread count = 40
○ # k8s pods = 3
○ cpuRequest = 8
○ cpuLimit = 16
○ memoryRequest = 24Gi
○ memoryLimit = 24Gi
● Read output
○ Throughput = 22557.7 Ops/S
○ Average latency = 1.699 ms
○ 99%tile latency = 5.323 ms
● Write output
○ Throughput = 15256.16 Ops/S
○ Average latency = 8.820 ms
○ 99%tile = 27.071 ms
Pulsar Virtual Summit North America 2021
Production readiness
● It has already been in production for last few
months
● It’s a k8s based deployment
● Sustained production traffic
○ Read throughput 240 Ops/S
○ Write throughput 90 Ops/S
Pulsar Virtual Summit North America 2021
Product roadmap
● Contribute internal changes back to open source
● Storage hardening
○ Pick up the data litter actively and reactively
○ Clear obsolete transaction log
● Operability
○ Improved and granular monitoring and alerting.
● Availability
○ Implementation of replica.
○ Serving read traffic from a replica.
○ Elevation of replica to be primary when primary fails
● Scaling and load balancing
○ Splitting a shard
○ Moving shards across cluster
Pulsar Virtual Summit North America 2021
References
● Statestore Repo:
https://github.com/apache/bookkeeper/tree/master/stream
● Distributedlog Repo:
https://github.com/apache/bookkeeper/tree/master/stream/distri
butedlog
● Pulsar Function - Statestore integration:
https://github.com/apache/pulsar/blob/master/pulsar-
functions/worker/src/main/java/org/apache/pulsar/functions/wor
ker/PulsarWorkerService.java#L420

Contenu connexe

Tendances

Tendances (20)

Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architecture
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Grafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for Logs
 
Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
High Availability and Disaster Recovery in PostgreSQL - EQUNIXHigh Availability and Disaster Recovery in PostgreSQL - EQUNIX
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
 
MySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestMySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software Test
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 

Similaire à Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021

Benchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetesBenchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetes
DoKC
 
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
StreamNative
 
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Nitin Kumar
 

Similaire à Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021 (20)

Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
 
Benchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetesBenchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetes
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
 
Infinitic: Building a Workflow Engine on Top of Pulsar - Pulsar Summit NA 2021
 Infinitic: Building a Workflow Engine on Top of Pulsar - Pulsar Summit NA 2021 Infinitic: Building a Workflow Engine on Top of Pulsar - Pulsar Summit NA 2021
Infinitic: Building a Workflow Engine on Top of Pulsar - Pulsar Summit NA 2021
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
In-memory No SQL- GIDS2014
In-memory No SQL- GIDS2014In-memory No SQL- GIDS2014
In-memory No SQL- GIDS2014
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency Database
 
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
 
Architectural caching patterns for kubernetes
Architectural caching patterns for kubernetesArchitectural caching patterns for kubernetes
Architectural caching patterns for kubernetes
 
Function Mesh: Complex Streaming Jobs Made Simple - Pulsar Summit NA 2021
Function Mesh: Complex Streaming Jobs Made Simple - Pulsar Summit NA 2021Function Mesh: Complex Streaming Jobs Made Simple - Pulsar Summit NA 2021
Function Mesh: Complex Streaming Jobs Made Simple - Pulsar Summit NA 2021
 
Learn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best PracticesLearn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best Practices
 
[DevConf.US 2019]Quarkus Brings Serverless to Java Developers
[DevConf.US 2019]Quarkus Brings Serverless to Java Developers[DevConf.US 2019]Quarkus Brings Serverless to Java Developers
[DevConf.US 2019]Quarkus Brings Serverless to Java Developers
 
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
 
Ohio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCPOhio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCP
 
YARN
YARNYARN
YARN
 
SVC / Storwize analysis cost effective storage planning (use case)
SVC / Storwize analysis cost effective storage planning (use case)SVC / Storwize analysis cost effective storage planning (use case)
SVC / Storwize analysis cost effective storage planning (use case)
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
 
Architectural caching patterns for kubernetes
Architectural caching patterns for kubernetesArchitectural caching patterns for kubernetes
Architectural caching patterns for kubernetes
 

Plus de StreamNative

Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative
 

Plus de StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021

  • 1. Pulsar Virtual Summit North America 2021 Apache BookKeeper State Store: A Durable Key-Value Store
  • 2. Pulsar Virtual Summit North America 2021 Prashant Kumar Principal Software Engineer @ Splunk ● Principal Software Developer at Splunk. ● Ex Yahoo, Verizon Media. ● Prior Experience, Key member and contributor to Sherpa, the geographically replicated multi tenant key value store @ Yahoo.
  • 3. Pulsar Virtual Summit North America 2021 Agenda I. Introduction to Apache Bookkeeper Statestore. II. Why yet another KV store? III. How does it fit into Pulsar ecosystem? IV. Intended use case. V. Brief architecture. VI. Current state and production worthiness. VII. Product roadmap and future work.
  • 4. Pulsar Virtual Summit North America 2021 Apache Bookkeeper Statestore ● It’s a Key-Value store. ● It’s durable. ● It’s locally replicated. ● It’s eventually consistent. ● It’s fault tolerant. ● It’s cloud native and k8s based deployment.
  • 5. Pulsar Virtual Summit North America 2021 Argh!. Yet another KV store?
  • 6. Pulsar Virtual Summit North America 2021 Integral part of Apache Pulsar ecosystem ● Uses same Zookeeper deployment that Pulsar uses. ● Uses same Bookkeeper deployment that Pulsar uses. ● Uses same infrastructure for metrics, dashboards etc as Bookkeeper ● Part of bookkeeper/stream code base. ● Existing client side integration in Apache Pulsar function service.
  • 7. Pulsar Virtual Summit North America 2021 Primary use cases ● Store and access function state and checkpoints ● A secondary metadata store for Apache Pulsar, away from Zookeeper ● Other various KV store use cases
  • 8. Pulsar Virtual Summit North America 2021 Data model
  • 9. Pulsar Virtual Summit North America 2021 High level serving architecture (Bird view)
  • 10. Pulsar Virtual Summit North America 2021 High level Datastore architecture
  • 11. Pulsar Virtual Summit North America 2021 Benchmarking ● Benchmarking with YCSB ● Setup ○ YCSB Thread count = 40 ○ # k8s pods = 3 ○ cpuRequest = 8 ○ cpuLimit = 16 ○ memoryRequest = 24Gi ○ memoryLimit = 24Gi ● Read output ○ Throughput = 22557.7 Ops/S ○ Average latency = 1.699 ms ○ 99%tile latency = 5.323 ms ● Write output ○ Throughput = 15256.16 Ops/S ○ Average latency = 8.820 ms ○ 99%tile = 27.071 ms
  • 12. Pulsar Virtual Summit North America 2021 Production readiness ● It has already been in production for last few months ● It’s a k8s based deployment ● Sustained production traffic ○ Read throughput 240 Ops/S ○ Write throughput 90 Ops/S
  • 13. Pulsar Virtual Summit North America 2021 Product roadmap ● Contribute internal changes back to open source ● Storage hardening ○ Pick up the data litter actively and reactively ○ Clear obsolete transaction log ● Operability ○ Improved and granular monitoring and alerting. ● Availability ○ Implementation of replica. ○ Serving read traffic from a replica. ○ Elevation of replica to be primary when primary fails ● Scaling and load balancing ○ Splitting a shard ○ Moving shards across cluster
  • 14. Pulsar Virtual Summit North America 2021 References ● Statestore Repo: https://github.com/apache/bookkeeper/tree/master/stream ● Distributedlog Repo: https://github.com/apache/bookkeeper/tree/master/stream/distri butedlog ● Pulsar Function - Statestore integration: https://github.com/apache/pulsar/blob/master/pulsar- functions/worker/src/main/java/org/apache/pulsar/functions/wor ker/PulsarWorkerService.java#L420

Notes de l'éditeur

  1. Title page
  2. Speaker introduction
  3. Title page
  4. Title page
  5. Title page
  6. Title page