Contenu connexe Similaire à Aerospike Data Modeling - Meetup Dec 2019 (20) Aerospike Data Modeling - Meetup Dec 20192. 2 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
3. 3 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ Cassandra databases, including derivatives such as ScyllaDB, have a
needle in a haystack problem
▪ In C* each user ID – segment ID pair is in its own row
▪ This affects performance when you need low latency key-value operations
▪ In Aerospike we keep all the segments together in a single record
tl;dr
4. 4 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ In digital advertising user profiles stores assist with audience segmentation
▪ The goal is to pull user segments for a specific user as fast as possible
▪ Modeling this use case is generally applicable to other forms of online
personalization
User Profile Stores
5. 5 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
CREATE TABLE userspace.user_segments (
user_id uuid,
segment_id int,
attr smallint,
attr2 smallint,
PRIMARY KEY ((user_id, segment_id), user_id)
)
▪ On average 1000 segments per profile
▪ 50 billion cookies means 50 trillion rows
▪ Large latency to find 1000 segments of a user from a huge number of rows
Modeling in Cassandra
6. 6 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
{segmentID: [segment-TTL, {attr1, attr2}]}
{ 8457: [8889*, {}],
12845: [8889, {}],
42199: [8889, {}],
43696: [8889, {}],
}
▪ * Segment TTL uses local epoch (hours since epoch)
▪ The map ordering options are UNORDERED, K-ORDERED and KV-ORDERED
▪ Choosing K-ORDERED gives the best performance for data on SSD
Modeling in Aerospike
7. 7 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ We can easily upsert into the map new user segments as they are
processed (https://github.com/aerospike-examples/modeling-user-segmentation)
Advantages
8. 8 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ We can use get_by_value_interval to filter segments that have a
specific ‘freshness’
Advantages
9. 9 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ We can use the map remove_by_value_interval operation to trim
expired segments
▪ Mainly, this allows for orders of magnitude faster retrieval of a user’s
segments from the user profile store. Just get the record.
Advantages
10. 10 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ We can use the map remove_by_value_interval operation to trim
expired segments, called as a background scan operation (>= 4.7)
▪ Mainly, this allows for orders of magnitude faster retrieval of a user’s
segments from the user profile store
Advantages
11. 11 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
List operations supported by the server. Method names in the clients might be different.
• General Write Flags: (create_only, update_only, no_fail, partial)
• resize()
• insert(), remove(), set()
• or(), and(), xor(), not()
• lshift(), rshift()
• add(), subtract(), set-integer()
• get(), count()
• lscan(), rscan()
• get-integer()
Bitwise Operations
12. 12 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
▪ Represent the segments as a continuous bitfield
▪ Each integer is a bit position. Set the bit for a segment the user is in
▪ Bitwise operations to check server-side if user is in multiple segments
▪ Compresses extremely well in Enterprise Edition
▪ Caveat: can't apply a TTL to the segments
Modeling with Bitfields
13. 13 Proprietary & Confidential | All rights reserved. © 2019 Aerospike Inc.
List & Map API
▪ https://www.aerospike.com/docs/guide/cdt-list.html
▪ https://www.aerospike.com/docs/guide/cdt-map.html
▪ https://www.aerospike.com/docs/guide/cdt-context.html
▪ https://www.aerospike.com/docs/guide/cdt-ordering.html
▪ https://aerospike-python-client.readthedocs.io/en/latest/aerospike_helpers.operations.html
▪ https://www.aerospike.com/apidocs/java/com/aerospike/client/cdt/ListOperation.html
Code Samples
▪ https://github.com/aerospike-examples/modeling-user-segmentation
Aerospike Training
▪ https://www.aerospike.com/training/
More material you can explore: