CQRS (Command Query Responsibility Segregation) is a pattern, which separates the process of querying and updating data. As a query only returns data without any side effects, a command is designed to change data. CQRS is often combined with Event Sourcing. This is an architecture in which all changes to an application state are stored as a sequence of events.
Because of its great capability to store time series data Cassandra is the perfect fit for implementing the event store. But there a still a lot of open questions: What about the data modeling? What techniques will be used to process and store data in the Cassandra database? How to access the current state of the application, without replaying every event? And what about failure handling?
In this talk, I will give a brief introduction to CQRS and the Event Sourcing pattern and will then answer the questions above using a real life example of a data store for customer data.
5. •Solution needs to be highly scalable
(up to 100.000 reads/s, 10.000 writes/s)
•Read and write access needs to be low latency
•Read/write ratio is 10:1 or higher
•Solution needs to deal with up to 500.000.000
customers
Assumptions_
5
7. Traditional Pattern: Saving Application State_
7
Store
ID
Address
Article
Name
StockSize updateInventory()
getInventory()
sells
8. A series of sales and replenishments for
• a tablet
• Starting with 60, sell 20, replenish 10
• a stove
• Starting with 25, sell 5, no replenishments
What is different with Event Sourcing?_
8
9. Saving only application state
What is the Difference?_
9
:ArticleInventory
Fancy Tablet
50
:ArticleInventory
Gas Stove
20
10. Saving events instead of state
What is the Difference?_
10
:ArticleInventory
Fancy Tablet
39
15-08-14T19:..
:ArticleInventory
Gas Stove
20
15-08-14T19:..
:ArticleInventory
Fancy Tablet
45
15-08-14T19:..
:ArticleInventory
Gas Stove
20
15-08-14T19:..
:ArticleInventory
Fancy Tablet
50
15-08-14T19:..
:ArticleInventory
Gas Stove
20
15-08-14T19:..
11. •Log of all stock changes
•Complete rebuild of the state
•Temporal query
•Event replay and rollback
Benefits of Storing Events_
11
15. •The pattern is simple
•Going further
• Split up the domain model
• Independent scaling of models
• Not using a query model at all
• Different databases for models
A Pattern Changing Your Mindset_
15
16. Event Sourcing & CQRS_
16
Command
Services
Command
Model
ReadLayer
Query
Services
Query
Services
Query
Services Asynchronous
DB
Event Store
Query
Stores
ProcessorEvent
Processor
DB
DB
DB
18. •Not only an event sink
• Compaction
• Selective replay
•No single point of failure
•Horizontal scale & Geo Replication
•Write ahead of unmodified data
•Plays well with further processing
•Open source & a huge community
•Easy operations
Why Cassandra…
18
19. For accessing all entities of a given type
Event Store_
19
CREATE TABLE event_source_by_type (
entity_type TEXT,
bucket INT,
entity_key TEXT,
insert_time TIMESTAMP,
update_time TIMESTAMP,
payload TEXT,
PRIMARY KEY((entity_type,bucket),insert_time,entity_key)
)
WITH CLUSTERING ORDER BY (created_at DESC,entity_key ASC);
e.g. as JSON, XML, protobuf, Avro
prevent huge partitions
20. CREATE TABLE event_source_by_key (
entity_type TEXT,
entity_key TEXT,
insert_time TIMESTAMP,
update_time TIMESTAMP,
payload TEXT,
PRIMARY KEY((entity_type,entity_key),created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
For accessing an entity directly
Optional: Second Table_
20
e.g. as JSON, XML or protobuf
21. •Create tables that fit your queries!
•E.g. „Get articles in category ‚computer‘“
Query Stores_
21
CREATE TABLE articles_by_category (
category TEXT PRIMARY KEY,
article_id UUID,
article_info TEXT
);
may need bucketing
could also be a
JSON document
22. Query Stores_
22
„I need ad-hoc queries“
„I need specific queries with
a lot of different filters“
25. •Command model triggers event processor
•Event processor updates query views
From Event Store to Query Store_
25
Command
Model
Event
Processor DB
DB
DB
Event
Processor
Event
Processor
27. •Easy scale out
•Easy deployment
•Intuitive Scala & Java API
•Fault tolerant
•Out-of-the-box Kafka adapter
•Integrates well with Cassandra
Why Spark?
27
28. •Spark Streaming application
•Consumes only topics of interest
•Joins the stream of events with the current view
• Use primary key of entity for correlation
• Use joinWithCassandraTable
Spark Job in Detail_
28
29. 1. Create a table for the query view
2. Create a Spark job filling your table
3. Deploy the Spark job
4. Init reprocess of the event DB
• same transformation logic as in normal processing
• source can be different
5. Mark view as initialized
If you need a new query view_
29
Query
DB
Event
DB
31. •Scalability
• On storage & processing: just add nodes
• Efficient queries due to separation
•Collaboration
• Every client gets its own data access
• Easy to support new queries
Benefits_
31
32. •More complexity than simple CRUD
•Side effects on event replay
•Eventual consistency in query views
•Concurrent writes
•Performance of replay
Pitfalls_
32
33. Lost Updates
•Due to parallel processing
• Two events A and B as sequential input
• A is processed after B
•Solution
• Partition Spark RDD by entity key
• Use a lambda architecture
Pitfalls_
33
speed
Data
Stream
Serving
Layer
batch
34. •Event Store Compaction
• Compact store to improve processing time
• Only store latest entry of a entity key
• e.g. a Spark batch job / Cassandra TTL
•Snapshot / Master State
• Constantly build a complete state of all data
• Can be used
• To speed up initialization
• As a store for a search engine
Pitfalls_
34