SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
A High-Level Pass
Through Redis Analytics*
by Josiah Carlson www.dr-josiah.com
@dr_josiah bit.ly/redis-in-action
Agenda
● Quick overview of Redis
● Monthly unique return/churn
○ too much memory method
○ reasonable memory method
○ very low memory method
● Visitor action sequence analytics
○ sequence method
○ low-memory method
● Geographic notifications with partitioning*
Quick Redis overview
● Remote key -> data structure server
○ Strings/integers/bitmaps
○ Lists of strings
○ Sets of unique string members
○ Hashes of key -> value
○ Sorted sets (ZSETs) mapping of member -> score
● Supports
○ Persistence
○ Replication
○ Publish/subscribe
○ Server-side Lua scripting (like a stored procedure)
○ Client-side sharding (server side in-progress)
Monthly unique return/churn
Problem:
● Say that you have millions of monthly visitors
● Need to know monthly churn, expected
~50%
● Don't want to waste too much memory
Monthly unique return/churn
Too much memory:
● Generate UUIDs for users, store in cookie
● Use a HASH mapping from UUIDs to int ids
● Use a HASH mapping from int ids to UUIDs
● Create a ZSET of short ids to timestamp
● Use per-month bitmaps for churn calculation
● Recycle int ids based on old timestamps,
discarding UUIDs and resetting bits
Monthly unique return/churn
Drawbacks:
● Memory use based on size of HASHes and
ZSET (about to 400 bytes/unique user)
● Second HASH can be thrown away
● The other HASH, ZSET, and bitmaps can be
thrown away and replaced by a "this month"
and "last month" SET (about 120 bytes/user)
● With 63 bit integer UUID and sharding
techniques, about 16 bytes/user
Monthly unique return/churn
Reasonable memory solution:
● Store per-month id in a signed cookie (lower-32 is the
unique id for the month, next 8 is the month)
● One month of bitmap
● If this month cookie, do nothing
● If last month cookie and bit isn't set for that id, mark the
bitmap, generate a new cookie, increment unique and
returning counts
● If last month cookie and bit is set, generate a new
cookie
● If old cookie or no cookie, generate a new cookie,
increment unique count
Monthly unique return/churn
Drawbacks:
● Memory use based on unique monthly
counts, ~1 bit per user (not bad)
● If you push to hundreds of millions/billions of
users, you should shard your bitmaps to
minimize realloc cost on bitmap updates
Monthly unique return/churn
Very low memory method:
● Store per-month id in a signed cookie
● If this month cookie, do nothing
● If last month cookie, generate a new cookie
for the client, increment unique and return
counts
● If old cookie or no cookie, generate a new
cookie, increment unique count
Monthly unique return/churn
Drawback:
● If someone sends you duplicate cookies,
hard to detect (keep "recently replaced"
cache, 5-10 minutes worth is likely good
enough)
Tangent on ZSETs
This slide is a filler so that I can talk about one
of my favorite "get rid of ZSETs" tricks, which
results in significant memory savings for a fairly
large subset of problems
Visitor action sequences
Problem:
● How are my funnels performing?
● These suck:
Visitor action sequences
Sequence method:
● Each user gets a LIST
● All users are recorded in a ZSET with a score based on
time
● Each action/page RPUSHes the action/page to the LIST
● Clean-up/analyze old sequences based on timestamps
in the ZSET
Drawbacks:
● Memory use can be high for active users
● More detailed events can use more memory
Visitor action sequences
Low memory method:
● Each user gets a bitmap (limit your unique events)
● All actions are mapped to an index in the bitmap
● When a user performs the action/visits the page, set the
bit and update the ZSET
● Clean up/analyze old bitmaps based on timestamps in
the ZSET
Drawbacks:
● No more strict sequence analysis possible
● Memory use is dominated by ZSET storage
Geo Notifications
Problem:
● Want to send events to nearby users
● Don't want users to be notified too often
● Reduce radius of results as notifications rise
● Increase radius of results as notifications fall
● Allow for history to be received on connect
Geo Notifications
● Consider the world as a recursively-divided series of
blocks (highest level as 1x1 degree)
● Clients subscribe to all block levels that their user is in
or is interested in
● When writing an event at point (lat,lon):
○ Add the event id to ZSETs to as deep a partition as you would ever
expect to need
○ Trim the ZSETs along the way based on your desired history
○ Check the resulting size of the ZSETs to determine the highest-level
block that is under your limit
○ Publish the event to a channel based on that level
Geo Notifications
Drawbacks:
● Event id/timestamp information is duplicated
● Large histories may use significant memory
(ZSETs can be replaced by LISTs with
minimal changes)
● Old data in un-visited blocks aren't cleaned
out (can add expiration)
Other questions?
Thank you
@dr_josiah www.dr-josiah.com
bit.ly/redis-in-action

Contenu connexe

Similaire à Josiah carlson 2013-05-16 - redis analytics

Scaling event aggregation at twitter
Scaling event aggregation at twitterScaling event aggregation at twitter
Scaling event aggregation at twitterlohitvijayarenu
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioAlluxio, Inc.
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingMartinStrycek
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series databasefelixbarny
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)Mihnea Giurgea
 
A Technical Introduction to RTBkit
A Technical Introduction to RTBkitA Technical Introduction to RTBkit
A Technical Introduction to RTBkitDatacratic
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA
 
Don’t give up, You can... Cache!
Don’t give up, You can... Cache!Don’t give up, You can... Cache!
Don’t give up, You can... Cache!Stefano Fago
 
Those days
Those daysThose days
Those daysChiao Fu
 
Denver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with It
Denver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with ItDenver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with It
Denver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with ItBrian Statkevicus
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleItai Yaffe
 
Space Ape's Analytics Stack
Space Ape's Analytics StackSpace Ape's Analytics Stack
Space Ape's Analytics StackSimon Hade
 
Deltek Vision User Group Meeting - Q2 2013
Deltek Vision User Group Meeting - Q2 2013Deltek Vision User Group Meeting - Q2 2013
Deltek Vision User Group Meeting - Q2 2013BCS ProSoft
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitTim Bell
 
Building ZingMe News Feed System
Building ZingMe News Feed SystemBuilding ZingMe News Feed System
Building ZingMe News Feed SystemChau Thanh
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Dan Cundiff
 

Similaire à Josiah carlson 2013-05-16 - redis analytics (20)

Scaling event aggregation at twitter
Scaling event aggregation at twitterScaling event aggregation at twitter
Scaling event aggregation at twitter
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
MicroStrategy at Badoo
MicroStrategy at BadooMicroStrategy at Badoo
MicroStrategy at Badoo
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series database
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)
 
Sea of Data
Sea of DataSea of Data
Sea of Data
 
A Technical Introduction to RTBkit
A Technical Introduction to RTBkitA Technical Introduction to RTBkit
A Technical Introduction to RTBkit
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
 
Don’t give up, You can... Cache!
Don’t give up, You can... Cache!Don’t give up, You can... Cache!
Don’t give up, You can... Cache!
 
Those days
Those daysThose days
Those days
 
Denver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with It
Denver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with ItDenver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with It
Denver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with It
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
 
Space Ape's Analytics Stack
Space Ape's Analytics StackSpace Ape's Analytics Stack
Space Ape's Analytics Stack
 
Deltek Vision User Group Meeting - Q2 2013
Deltek Vision User Group Meeting - Q2 2013Deltek Vision User Group Meeting - Q2 2013
Deltek Vision User Group Meeting - Q2 2013
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summit
 
Building ZingMe News Feed System
Building ZingMe News Feed SystemBuilding ZingMe News Feed System
Building ZingMe News Feed System
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014
 

Dernier

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Josiah carlson 2013-05-16 - redis analytics

  • 1. A High-Level Pass Through Redis Analytics* by Josiah Carlson www.dr-josiah.com @dr_josiah bit.ly/redis-in-action
  • 2. Agenda ● Quick overview of Redis ● Monthly unique return/churn ○ too much memory method ○ reasonable memory method ○ very low memory method ● Visitor action sequence analytics ○ sequence method ○ low-memory method ● Geographic notifications with partitioning*
  • 3. Quick Redis overview ● Remote key -> data structure server ○ Strings/integers/bitmaps ○ Lists of strings ○ Sets of unique string members ○ Hashes of key -> value ○ Sorted sets (ZSETs) mapping of member -> score ● Supports ○ Persistence ○ Replication ○ Publish/subscribe ○ Server-side Lua scripting (like a stored procedure) ○ Client-side sharding (server side in-progress)
  • 4. Monthly unique return/churn Problem: ● Say that you have millions of monthly visitors ● Need to know monthly churn, expected ~50% ● Don't want to waste too much memory
  • 5. Monthly unique return/churn Too much memory: ● Generate UUIDs for users, store in cookie ● Use a HASH mapping from UUIDs to int ids ● Use a HASH mapping from int ids to UUIDs ● Create a ZSET of short ids to timestamp ● Use per-month bitmaps for churn calculation ● Recycle int ids based on old timestamps, discarding UUIDs and resetting bits
  • 6. Monthly unique return/churn Drawbacks: ● Memory use based on size of HASHes and ZSET (about to 400 bytes/unique user) ● Second HASH can be thrown away ● The other HASH, ZSET, and bitmaps can be thrown away and replaced by a "this month" and "last month" SET (about 120 bytes/user) ● With 63 bit integer UUID and sharding techniques, about 16 bytes/user
  • 7. Monthly unique return/churn Reasonable memory solution: ● Store per-month id in a signed cookie (lower-32 is the unique id for the month, next 8 is the month) ● One month of bitmap ● If this month cookie, do nothing ● If last month cookie and bit isn't set for that id, mark the bitmap, generate a new cookie, increment unique and returning counts ● If last month cookie and bit is set, generate a new cookie ● If old cookie or no cookie, generate a new cookie, increment unique count
  • 8. Monthly unique return/churn Drawbacks: ● Memory use based on unique monthly counts, ~1 bit per user (not bad) ● If you push to hundreds of millions/billions of users, you should shard your bitmaps to minimize realloc cost on bitmap updates
  • 9. Monthly unique return/churn Very low memory method: ● Store per-month id in a signed cookie ● If this month cookie, do nothing ● If last month cookie, generate a new cookie for the client, increment unique and return counts ● If old cookie or no cookie, generate a new cookie, increment unique count
  • 10. Monthly unique return/churn Drawback: ● If someone sends you duplicate cookies, hard to detect (keep "recently replaced" cache, 5-10 minutes worth is likely good enough)
  • 11. Tangent on ZSETs This slide is a filler so that I can talk about one of my favorite "get rid of ZSETs" tricks, which results in significant memory savings for a fairly large subset of problems
  • 12. Visitor action sequences Problem: ● How are my funnels performing? ● These suck:
  • 13. Visitor action sequences Sequence method: ● Each user gets a LIST ● All users are recorded in a ZSET with a score based on time ● Each action/page RPUSHes the action/page to the LIST ● Clean-up/analyze old sequences based on timestamps in the ZSET Drawbacks: ● Memory use can be high for active users ● More detailed events can use more memory
  • 14. Visitor action sequences Low memory method: ● Each user gets a bitmap (limit your unique events) ● All actions are mapped to an index in the bitmap ● When a user performs the action/visits the page, set the bit and update the ZSET ● Clean up/analyze old bitmaps based on timestamps in the ZSET Drawbacks: ● No more strict sequence analysis possible ● Memory use is dominated by ZSET storage
  • 15. Geo Notifications Problem: ● Want to send events to nearby users ● Don't want users to be notified too often ● Reduce radius of results as notifications rise ● Increase radius of results as notifications fall ● Allow for history to be received on connect
  • 16. Geo Notifications ● Consider the world as a recursively-divided series of blocks (highest level as 1x1 degree) ● Clients subscribe to all block levels that their user is in or is interested in ● When writing an event at point (lat,lon): ○ Add the event id to ZSETs to as deep a partition as you would ever expect to need ○ Trim the ZSETs along the way based on your desired history ○ Check the resulting size of the ZSETs to determine the highest-level block that is under your limit ○ Publish the event to a channel based on that level
  • 17. Geo Notifications Drawbacks: ● Event id/timestamp information is duplicated ● Large histories may use significant memory (ZSETs can be replaced by LISTs with minimal changes) ● Old data in un-visited blocks aren't cleaned out (can add expiration)