SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez nos Conditions d’utilisation et notre Politique de confidentialité.
SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez notre Politique de confidentialité et nos Conditions d’utilisation pour en savoir plus.
Data Modeling ExamplesMatthew F. Dennis // @mdennis
Overview● general guiding goals for Cassandra data models● Interesting and/or common examples/questions to get us started● Should be plenty of time at the end for questions, so bring them up if you have them !
Data Modeling Goals● Keep data queried together on disk together● In a more general sense think about the efficiency of querying your data and work backward from there to a model in Cassandra● Usually, you shouldnt try to normalize your data (contrary to many use cases in relational databases)● Usually better to keep a record that something happened as opposed to changing a value (not always the best approach though)
Time Series Data● Easily the most common use of Cassandra ● Financial tick data ● Click streams ● Sensor data ● Performance metrics ● GPS data ● Event logs ● etc, etc, etc ...● All of the above are essentially the same as far as C* is concerned
Time Series Thought Model● Things happen in some timestamp ordered stream and consist of values associated with the given timestamp (i.e. “data points”) – Every 30 seconds record location, speed, heading and engine temp – Every 5 minutes record CPU, IO and Memory usage● We are interested in recreating, aggregating and/or analyzing arbitrary time slices of the stream – Where was agent:007 and what was he doing between 11:21am and 2:38pm yesterday? – What are the last N actions foo did on my site?
Data Points Defined● Each data point has 1-N values● Each data point corresponds to a specific point in time or an interval/bucket (e.g. 5 th minute of 17th hour on some date)
Data Points Mapped to Cassandra● Row Key is id of the data point stream bucketed by time – e.g. plane01:jan_2011 or plane01:jan_01_2011 for month or day buckets respectively● Column Name is TimeUUID(timestamp of date point)● Column Value is serialized data point – JSON, XML, pickle, msgpack, thrift, protobuf, avro, BSON, WTFe● Bucketing – Avoids always requiring multiple seeks when only small slices of the stream are requested (e.g. stream is 5 years old but Im on only interested in Jan 5 th 3 years ago and/or yesterday between 2pm and 3pm). – Make it easy to lazily aggregate old stream activity – Reduces compaction overhead since old rows will never have to be merged again (until you “back fill” and/or delete something)
A Slightly More Concrete Example● Sensor data from airplanes● Every 30 seconds each plane sends latitude+longitude, altitude and wine remaining in mdennis glass.
The Visual plane5:jan_2011 TimeUUID0 TimeUUID1 TimeUUID2 p5:j11 28.90, 124.30 45K feet 28.85, 124.25 44K feet 28.81, 124.22 44K feet 70% 50% 95% Middle of the ocean and half a glass of wine at 44K feet● Row Key is the id of stream being recorded (e.g. plane5:jan_2011)● Column Name is timestamp (or TimeUUID) associated with the data point● Column Value is the value of the event (e.g. protobuf serialized lat/long+alt+wine_level)
Querying● When querying, construct TimeUUIDs for the min/max of the time range in question and use them as the start/end in your get_slice call● Or use a empty start and/or end along with a count
Bucket Sizes?● Depends greatly on ● Average size of time slice queried ● Average data point size ● Write rate of data points to a stream ● IO capacity of the nodes
So... Bucket Sizes?● No Bigger than a few GB per row ● bucket_size * write_rate * sizeof(avg_data_point)● Bucket size >= average size of time slice queried● No more than maybe 10M entries per row● No more than a month if you have lots of different streams● NB: there are exceptions to all of the above, which are really nothing more than guidelines
Ordering● In cases where the most recent data is the most interesting (e.g. last N events for entity foo or last hour of events for entity bar), you can reverse the comparator (i.e. sort descending instead of ascending) ● http://thelastpickle.com/2011/10/03/Reverse-Comparators/ ● https://issues.apache.org/jira/browse/CASSANDRA-2355
Spanning Buckets● If your time slice spans buckets, youll need to construct all the row keys in question (i.e. number of unique row keys = spans+1)● If you want all the results between the dates, pass all the row keys to multiget_slice with the start and end of the desired time slice● If you only want the first N results within your time slice, lowest latency comes from multiget_slice as above but best efficiency comes from serially paging one row key at a time until your desired count is reached
Expiring Streams (e.g. “I only care about the past year”)● Just set the TTL to the age you want to keep● yeah, thats pretty much it ...
Counters● Sometimes youre only interested in counting things that happened within some time slice● Minor adaptation to the previous content to use counters (be aware they are not idempotent) ● Column names become buckets ● Values become counters
Example: Counting User Logins user3:system5:logins:by_day 20110107 ... 20110523 U3:S5:L:D 2 ... 7 2 logins on Jan 7th 2011 7 logins on May 23rd 2011 for user 3 on system 5 for user 3 on system 5 user3:system5:logins:by_hour 2011010710 ... 2011052316 U3:S5:L:H 1 ... 7one login for user 3 on system 5 2 logins for user 3 on system 5on Jan 7th 2011 for the 10th hour on May 23rd 2011 for the 16th hour
Eventually Atomic● In a legacy RDBMS atomicity is “easy”● Attempting full ACID compliance in distributed systems is a bad idea (and actually impossible in the strictest sense)● However, consistency is important and can certainly be achieved in C*● Many approaches / alternatives● I like a transaction log approach, especially in the context of C*
Transaction Logs (in this context)● Records what is going to be performed before it is actually performed● Performs the actions that need to be atomic (in the indivisible sense, not the all at once sense which is usually what people mean when they say isolation)● Marks that the actions were performed
In Cassandra● Serialize all actions that need to be performed in a single column – JSON, XML, YAML (yuck!), pickle, JSO, msgpack, protobuf, et cetera ● Row Key = randomly chosen C* node token ● Column Name = TimeUUID(nowish)● Perform actions● Delete Column
Configuration Details● Short gc_grace_seconds on the XACT_LOG Column Family (e.g. 5 minutes)● Write to XACT_LOG at CL.QUORUM or CL.LOCAL_QUORUM for durability ● if it fails with an unavailable exception, pick a different node token and/or node and try again (gives same semantics as a relational DB in terms of knowing the state of your transaction)
Failures● Before insert into the XACT_LOG● After insert, before actions● After insert, in middle of actions● After insert, after actions, before delete● After insert, after actions, after delete
Recovery● Each C* has a crond job offset from every other by some time period● Each job runs the same code: multiget_slice for all node tokens for all columns older than some time period (the “recovery period”)● Any columns need to be replayed in their entirety and are deleted after replay (normally there are no columns because normally things are working)
XACT_LOG Comments● Idempotent writes are awesome (thats why this works so well)● Doesnt work so well for counters (theyre not idempotent)● Clients must be able to deal with temporarily inconsistent data (they have to do this anyway)
Q?Cassandra Data Modeling Examples Matthew F. Dennis // @mdennis