SlideShare une entreprise Scribd logo
1  sur  19
Cassandra at Digby

Cody Koeninger
ckoeninger@digby.com
Localpoint Architecture
Localpoint In-App SDK
Location Algorithm – Opt-in – Push – Rich Message – Message Management

Localpoint Cloud
Messaging

Identity
•
Attributes
•
Location
History
•
Campaign
History

Campaign Management (Push, Triggered) – Mobile
Offer Management – Campaign Reporting

Create/Manage

Location

Location API

Accuracy, Power, Privacy Optimization – Geofence
Management - Cross-OS, Cross-Device

Create/Manage

Analytics / Events Engine

Profiles

Campaign API

Real-Time API

Visits – Dwell Time – Frequency - Occupancy

Publish/Subscribe

•

CRM
API

Analytics Engine API
Transaction Record Export

© 2013 Digby. CONFIDENTIAL

Web
Console
Why Cassandra?
●

Somewhat of a green field project: add market
segmentation (aka “Profiles”) to our existing
geolocation / messaging infrastructure

●

Horizontal scalability

●

Homogenous deployment, less ops pain

●

No pre-existing investment in Hadoop

●

Data model matches our problem
Devices
●

Android and iOS mobile devices

●

Unique ID

●

●

Other parts of the codebase handle
geolocation. Here we're concerned primarily
with device as an ID
~Millions of devices
Attributes
●

Arbitrary key-value pairs associated to devices

●

Defined by marketers and app developers

●

String, boolean, integer, date

●

Encrypted due to PII concerns

●

e.g. birthdate: 1989-01-01, ownsPs3: true

●

~100 attributes
Profiles
●
●

●

●

Market segmentation on attributes of devices
Boolean expressions comparing to a fixed
value
Combined via Boolean 'and', aka set
intersection. No 'or'
e.g. wantsPs4: birthdate >= 1978-01-01 &&
ownsPs3 == true && ownsPs4 == false

●

May be defined long after attributes are defined

●

~100 profiles
Data Modeling
●

●

●
●

For nonrelational data stores, you need to know
what your queries are before you store data
Probably true of relational databases as well,
but they let you get away with it
Answering queries via primary key is ideal
Cassandra has 2 parts to a primary key lookup:
partitioning (by hash), then clustering (by order)
Use Case 1: Triggered Messaging
●

●

When a device breaches a geofence, check to
see if it is in a profile, then send a promotion
e.g. device is near a store, and is in the
wantsPs4 profile, tell it there are Ps4s in stock

●

Latency is important

●

Query: Given a device, which profiles is it in?
Use Case 2: Scheduled Messaging
●

●

At some date and time, find all the devices in a
given profile, and send them a promotion
e.g. send all devices in the wantsPs4 profile a
message telling them Ps4 is out of stock for
months, but Xbox One is on sale cheap

●

Throughput is more important than latency

●

Query: Given a profile, which devices are in it?
Use Case 3: Historical Analytics
●

●

●

Marketers may want to analyze past data
based on attributes that were known at that
time, but not included in profiles at that time
In other words, we need to know raw facts
(attributes), not just derived conclusions (profile
membership)
Query: Given a device and time, what were the
attributes for that device at that time
Brainstorming
●

Need to answer 3 questions:

●

given Device, get Profiles

●

given Profile, get Devices

●

given (Device, Time), get Attributes
given (Device, Time), get Attributes
create table attributes (
brandCode ascii,
deviceId ascii,
unixtime bigint,
attrs blob,
primary key ((brandCode, deviceId), unixtime)
) with compact storage
and clustering order by (unixtime desc)
select attrs from attributes where brandCode = ? and
deviceId = ? and unixtime <= ? limit 1
given Device, get Profiles
select attrs from attributes where brandCode = ?
and deviceId = ? limit 1
Then, in code, filter the (relatively small) set of
profiles based on whether attrs match it
given Profile, get Devices
create table profile_devices (
brandCode ascii,
profileId bigint,
deviceId ascii,
primary key((brandCode, profileId), deviceId)
) with compact storage
select deviceId from profile_devices where
brandCode = ? and profileId = ?
Why Spark?
●
●

Scala
Distributed computing that will interop with
Hadoop IO (and thus Cassandra), but doesn't
depend on HDFS

●

Approachable codebase (20kloc, vs 200kloc+)

●

Interactive shell

●

Fast to write, fast to run
Why Spark?
file = spark.textFile("hdfs://...")
file.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
Deployment
●

http://spark.incubator.apache.org/docs/0.8.1/spark-standalone.html

●

Spark worker processes on Cassandra storage nodes

●

Gives data locality

●

Spark master process on Cassandra monitoring machine

●

Cluster start/stop done via ssh key from master

●

Submit jobs to master url

●

Consider pre-installing dependency jars on workers

●

Must use exact same binary version of Scala throughout
Spark / Cassandra Interop
// from CassandraTest.scala in the Spark distro
val casRdd = sc.newAPIHadoopRDD(job.getConfiguration(),
classOf[ColumnFamilyInputFormat], classOf[ByteBuffer],
classOf[SortedMap[ByteBuffer, IColumn]])
// Let us first get all the paragraphs from the retrieved rows
val paraRdd = casRdd.map {
case (key, value) => {
ByteBufferUtil.string(value.get(ByteBufferUtil.bytes("para")).value())
}
}
// Lets get the word count in paras
val counts = paraRdd.flatMap(p => p.split(" ")).
map(word => (word, 1)).
reduceByKey(_ + _)
counts.collect().foreach {
case (word, count) => println(word + ":" + count)
}
Spark Resources
●

●

●

Project homepage
http://spark.incubator.apache.org/
AMP Camp tutorials
http://ampcamp.berkeley.edu/
Introduction to Spark internals
http://www.youtube.com/watch?v=49Hr5xZyTEA

Contenu connexe

En vedette

"What TIME is it?" by Caitlin McGowan
"What TIME is it?" by Caitlin McGowan"What TIME is it?" by Caitlin McGowan
"What TIME is it?" by Caitlin McGowan
cmmcgowan
 
Makalah softskill 2 rate of return
Makalah softskill 2 rate of returnMakalah softskill 2 rate of return
Makalah softskill 2 rate of return
Ibnu Siroj
 
Makalah pendidikan kewarganegaraan
Makalah pendidikan kewarganegaraanMakalah pendidikan kewarganegaraan
Makalah pendidikan kewarganegaraan
Ibnu Siroj
 

En vedette (9)

"What TIME is it?" by Caitlin McGowan
"What TIME is it?" by Caitlin McGowan"What TIME is it?" by Caitlin McGowan
"What TIME is it?" by Caitlin McGowan
 
Μελέτη διάγνωσης των αναγκών της τοπικής αγοράς εργασίας
Μελέτη διάγνωσης των αναγκών της τοπικής αγοράς εργασίαςΜελέτη διάγνωσης των αναγκών της τοπικής αγοράς εργασίας
Μελέτη διάγνωσης των αναγκών της τοπικής αγοράς εργασίας
 
Nature of organizing
Nature of organizingNature of organizing
Nature of organizing
 
Makalah softskill 2 rate of return
Makalah softskill 2 rate of returnMakalah softskill 2 rate of return
Makalah softskill 2 rate of return
 
Nilai Waktu dari Uang
Nilai Waktu dari UangNilai Waktu dari Uang
Nilai Waktu dari Uang
 
Disability Project - ASD
Disability Project - ASDDisability Project - ASD
Disability Project - ASD
 
Tugas ekonomi teknik # 1
Tugas ekonomi teknik # 1Tugas ekonomi teknik # 1
Tugas ekonomi teknik # 1
 
Makalah pendidikan kewarganegaraan
Makalah pendidikan kewarganegaraanMakalah pendidikan kewarganegaraan
Makalah pendidikan kewarganegaraan
 
"MENGANALISIS SUKU BUNGA"
"MENGANALISIS SUKU BUNGA""MENGANALISIS SUKU BUNGA"
"MENGANALISIS SUKU BUNGA"
 

Similaire à Cassandra at Digby

Similaire à Cassandra at Digby (20)

Pentesting iOS Applications
Pentesting iOS ApplicationsPentesting iOS Applications
Pentesting iOS Applications
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
SplunkLive! London 2016 Splunk Overview
SplunkLive! London 2016 Splunk OverviewSplunkLive! London 2016 Splunk Overview
SplunkLive! London 2016 Splunk Overview
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
GDSC Cloud Jam.pptx
GDSC Cloud Jam.pptxGDSC Cloud Jam.pptx
GDSC Cloud Jam.pptx
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
Tejas bichave m tech python
Tejas bichave  m tech pythonTejas bichave  m tech python
Tejas bichave m tech python
 
Serverless Computing with Python
Serverless Computing with PythonServerless Computing with Python
Serverless Computing with Python
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIs
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
Azure: un parque de diversiones en la nube para el desarrollador moderno by A...
Azure: un parque de diversiones en la nube para el desarrollador moderno by A...Azure: un parque de diversiones en la nube para el desarrollador moderno by A...
Azure: un parque de diversiones en la nube para el desarrollador moderno by A...
 
Outsmarting SmartPhones
Outsmarting SmartPhonesOutsmarting SmartPhones
Outsmarting SmartPhones
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Integrating Splunk into your Spring Applications
Integrating Splunk into your Spring ApplicationsIntegrating Splunk into your Spring Applications
Integrating Splunk into your Spring Applications
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
 
Serverless Computing with Google Cloud
Serverless Computing with Google CloudServerless Computing with Google Cloud
Serverless Computing with Google Cloud
 
Php melb cqrs-ddd-predaddy
Php melb cqrs-ddd-predaddyPhp melb cqrs-ddd-predaddy
Php melb cqrs-ddd-predaddy
 
Google's serverless journey: past to present
Google's serverless journey: past to presentGoogle's serverless journey: past to present
Google's serverless journey: past to present
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Cassandra at Digby

  • 1. Cassandra at Digby Cody Koeninger ckoeninger@digby.com
  • 2. Localpoint Architecture Localpoint In-App SDK Location Algorithm – Opt-in – Push – Rich Message – Message Management Localpoint Cloud Messaging Identity • Attributes • Location History • Campaign History Campaign Management (Push, Triggered) – Mobile Offer Management – Campaign Reporting Create/Manage Location Location API Accuracy, Power, Privacy Optimization – Geofence Management - Cross-OS, Cross-Device Create/Manage Analytics / Events Engine Profiles Campaign API Real-Time API Visits – Dwell Time – Frequency - Occupancy Publish/Subscribe • CRM API Analytics Engine API Transaction Record Export © 2013 Digby. CONFIDENTIAL Web Console
  • 3. Why Cassandra? ● Somewhat of a green field project: add market segmentation (aka “Profiles”) to our existing geolocation / messaging infrastructure ● Horizontal scalability ● Homogenous deployment, less ops pain ● No pre-existing investment in Hadoop ● Data model matches our problem
  • 4. Devices ● Android and iOS mobile devices ● Unique ID ● ● Other parts of the codebase handle geolocation. Here we're concerned primarily with device as an ID ~Millions of devices
  • 5. Attributes ● Arbitrary key-value pairs associated to devices ● Defined by marketers and app developers ● String, boolean, integer, date ● Encrypted due to PII concerns ● e.g. birthdate: 1989-01-01, ownsPs3: true ● ~100 attributes
  • 6. Profiles ● ● ● ● Market segmentation on attributes of devices Boolean expressions comparing to a fixed value Combined via Boolean 'and', aka set intersection. No 'or' e.g. wantsPs4: birthdate >= 1978-01-01 && ownsPs3 == true && ownsPs4 == false ● May be defined long after attributes are defined ● ~100 profiles
  • 7. Data Modeling ● ● ● ● For nonrelational data stores, you need to know what your queries are before you store data Probably true of relational databases as well, but they let you get away with it Answering queries via primary key is ideal Cassandra has 2 parts to a primary key lookup: partitioning (by hash), then clustering (by order)
  • 8. Use Case 1: Triggered Messaging ● ● When a device breaches a geofence, check to see if it is in a profile, then send a promotion e.g. device is near a store, and is in the wantsPs4 profile, tell it there are Ps4s in stock ● Latency is important ● Query: Given a device, which profiles is it in?
  • 9. Use Case 2: Scheduled Messaging ● ● At some date and time, find all the devices in a given profile, and send them a promotion e.g. send all devices in the wantsPs4 profile a message telling them Ps4 is out of stock for months, but Xbox One is on sale cheap ● Throughput is more important than latency ● Query: Given a profile, which devices are in it?
  • 10. Use Case 3: Historical Analytics ● ● ● Marketers may want to analyze past data based on attributes that were known at that time, but not included in profiles at that time In other words, we need to know raw facts (attributes), not just derived conclusions (profile membership) Query: Given a device and time, what were the attributes for that device at that time
  • 11. Brainstorming ● Need to answer 3 questions: ● given Device, get Profiles ● given Profile, get Devices ● given (Device, Time), get Attributes
  • 12. given (Device, Time), get Attributes create table attributes ( brandCode ascii, deviceId ascii, unixtime bigint, attrs blob, primary key ((brandCode, deviceId), unixtime) ) with compact storage and clustering order by (unixtime desc) select attrs from attributes where brandCode = ? and deviceId = ? and unixtime <= ? limit 1
  • 13. given Device, get Profiles select attrs from attributes where brandCode = ? and deviceId = ? limit 1 Then, in code, filter the (relatively small) set of profiles based on whether attrs match it
  • 14. given Profile, get Devices create table profile_devices ( brandCode ascii, profileId bigint, deviceId ascii, primary key((brandCode, profileId), deviceId) ) with compact storage select deviceId from profile_devices where brandCode = ? and profileId = ?
  • 15. Why Spark? ● ● Scala Distributed computing that will interop with Hadoop IO (and thus Cassandra), but doesn't depend on HDFS ● Approachable codebase (20kloc, vs 200kloc+) ● Interactive shell ● Fast to write, fast to run
  • 16. Why Spark? file = spark.textFile("hdfs://...") file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)
  • 17. Deployment ● http://spark.incubator.apache.org/docs/0.8.1/spark-standalone.html ● Spark worker processes on Cassandra storage nodes ● Gives data locality ● Spark master process on Cassandra monitoring machine ● Cluster start/stop done via ssh key from master ● Submit jobs to master url ● Consider pre-installing dependency jars on workers ● Must use exact same binary version of Scala throughout
  • 18. Spark / Cassandra Interop // from CassandraTest.scala in the Spark distro val casRdd = sc.newAPIHadoopRDD(job.getConfiguration(), classOf[ColumnFamilyInputFormat], classOf[ByteBuffer], classOf[SortedMap[ByteBuffer, IColumn]]) // Let us first get all the paragraphs from the retrieved rows val paraRdd = casRdd.map { case (key, value) => { ByteBufferUtil.string(value.get(ByteBufferUtil.bytes("para")).value()) } } // Lets get the word count in paras val counts = paraRdd.flatMap(p => p.split(" ")). map(word => (word, 1)). reduceByKey(_ + _) counts.collect().foreach { case (word, count) => println(word + ":" + count) }
  • 19. Spark Resources ● ● ● Project homepage http://spark.incubator.apache.org/ AMP Camp tutorials http://ampcamp.berkeley.edu/ Introduction to Spark internals http://www.youtube.com/watch?v=49Hr5xZyTEA

Notes de l'éditeur

  1. Short Script: “All of this is made possible by the advanced technology we’ve made available in the Digby Mobile Suite, an enterprise-grade and PCI certified SaaS platform that is our focus as a company. Our customer, using Digby Services or in self-implementation mode, can use each of these products, in blue, to support the building of applications. Each of them is modular and works with the others, all of them connected to the base platform and a collection of shared services and integration points. The Digby Mobile Console, as mentioned before, is the place where customers can manage the products they have deployed and access relevant analytics both within each product and across the entire solution. This Digby Mobile Suite allows for the deployment of powerful mobile websites and rich applications quickly, efficiently, and with less risk than any custom-built work. It handles cross-platform differences elegantly. And in a space that is constantly changing and innovating, each of these products has its own roadmap where we continue to handle any platform changes and bring innovations to market that make the products more powerful over time. Additionally, future products mean that customers can extend the capabilities of their mobile footprint even more widely, ensuring they are keeping pace with consumer expectations.” &lt;number&gt;