A look at how the scalable storage architecture of Apache Pulsar makes it possible to retain and access any length of event or message history in Pulsar.
6. Apache Pulsar
Messaging System
Pubsub and queuing
Strong guarantees on message persistence
Unlimited topic backlog size
Tiered Storage
Lightweight compute
SQL interface available
7. Anatomy of a topic backlog
A B C
Broker
Client produces message
Broker acknowledges
message to client
Broker writes message
to topic backlog
D E F G
Topic Backlog
8. Anatomy of a topic backlog
A B C D E F G
Closed Open
Topic Backlog
12. Mirroring
1/n = 0.5 space efficiency
Tolerates n - 1 = 1 failures
Gets worse with more replicas
Striping with parity
1 - 1/n = 0.833 space efficiency
Tolerates 1 failure
Space efficiency increases with nParity
14. 1. Pulsar topic metadata updated with potential new location
2. All messages from segment copied to data object in long-term storage
3. Copying process tracks message offsets at defined interval
4. Offsets written to index object in long-term storage
5. Pulsar topic metadata updated with offload completed status
6. Original segment deleted from Pulsar storage after grace period
20. Apache Pulsar
• Unlimited backlog size
• Offload older data to cheaper storage
• Query with an SQL interface
Other uses cases for massive backlogs
• CQRS Event sourcing
• Data marts
• Audit logs
• Security logs