SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
Building an Activity
Feed with Cassandra
Mark Dunphy, Software Engineer
Behance/Adobe
@dunphtastic
Disclaimer
Not an operations person.
Will pretend to be one for the purpose of this talk.
Quick Overview
What is the Behance Activity Feed?
• Actions
• Comments, Appreciations, Etc
• Entities
• Projects, Works in Progress
• Actors
• Users
Project Entity
Actions taken
by actors
Activity Fan Out
User A publishes
a new project
Write to Follower A’s feed
Write to Follower B’s feed
Write to Follower C’s feed
Write to Follower D’s feed
Now that that’s over…
MongoDB
2011
• Smaller user base (~340,000).
• Built very quickly. Worked well at the time.
• Not well researched.
Fast forward to 2014
• Frequent node failures
• Heavy disk fragmentation caused by deletes
• Slow reads from disk. Started storing in RAM.
• Primary -> Secondary caused downtime for
some.
• Scaled out vertically and horizontally.
Why Cassandra?
• Riak
• Very close. Community seemed lacking.
• Redis
• No native cluster. Too much maintenance.
• Memcached/MySQL
• Too much complex app logic.
Cassandra Wins.
• Fantastic community. #cassandra on Twitter
• Easy to read documentation
• Linearly scalable. Easy to grow cluster.
• Low maintenance overhead for ops team.
• Handles time series data very well.
Learning
• Cassandra Summit 2014
• Other team in Adobe
• Long nights reading documentation
Our Data
• Ephemeral
• “Source of truth” lives in a MySQL database
• Okay with *some* data loss
Our Rules
• User’s feed is comprised of entities with one set
of actions
• User’s feed only contains one of any given entity
• An entity’s set of actions contains up to seven of
the most recent actions taken by that user’s
network
Planning
Language Support
• Most services on Behance are PHP
• No official Datastax PHP driver
–Mark Dunphy, 2014
“Looks like I’m learning python.”
Go to Production
No, nothing is working yet. I didn’t skip a slide.
• App/cluster in production before anything works
• Test real life load
• Fail spectacularly without anybody noticing
• Deploy risky changes without fear
• Run alongside MongoDB
January 19th, 2015
Query Patterns
• “Create your data models based on the queries
you want to run” - Basically Everybody
• Wanted to…
• Read a user’s feed entities by type and time of
most recent action…separately.
• Write/Update a user’s feed entities with new
actions while knowing only user id and entity id
Data Models
–Mark Dunphy, January 2015
“An UPDATE in Cassandra works like an
UPSERT! Let’s store the user’s entire feed in a
single row in a table! It’s so simple!”
First Data Model
CREATE TYPE activity.action (
created_on timestamp,
secondary_entity_id int,
actor_id int,
verb_id int
);
CREATE TYPE activity.entity (
entity_type_id int,
entity_id int
);
CREATE TABLE activity.project_actions (
modified_on timestamp,
entity_id int,
user_id int,
actions list<frozen<action>>,
PRIMARY KEY(user_id, entity_id)
)
CREATE TABLE activity.feeds (
modified_entities list<frozen<entity>>,
modified_on timestamp,
project_ids list<int>,
user_id int,
wip_revision_ids list<int>,
PRIMARY KEY(user_id)
)
First Data Model
First Data Model
Moments Before Everything Exploded
–Mark Dunphy, January 2015
“Okay let’s keep nearly the same model, but
use INSERT and DELETE instead of always
UPDATE. Just use batch statements.”
Second Data Model
Second Data Model
This was also a very very bad idea.
• Lose the benefit of Cassandra being distributed
• All queries go through the same coordinator
which puts a lot of stress and responsibility on
one node.
• Use concurrency and prepared statements
instead. Datastax drivers make this easy.
Second Data Model
Second Data Model
Oops
Okay…
Now we’ve got it.
Winning Data Model
CREATE TYPE activity.action (
created_on timestamp,
secondary_entity_id int,
actor_id int,
verb_id int
);
CREATE TABLE activity.projects (
created_on timestamp,
user_id int,
entity_id int,
actions list<frozen<action>>,
PRIMARY KEY(user_id, created_on, entity_id)
)
CREATE TABLE activity.project_actions (
modified_on timestamp,
entity_id int,
user_id int,
actions list<frozen<action>>,
PRIMARY KEY(user_id, entity_id)
)
Much Nicer
Write Strategy
• “User A comments on Project A. User B follows
User A.”
• Request out to add the comment action to User
B’s feed
• Read existing actions for that entity (Project A) in
B’s feed. Push new action on top.
• Write new actions list into new “row” in projects
table
Read Strategy
• SELECT * FROM projects WHERE user_id
= 123 AND created_on > 123214373
• Optimized for quick/easy reads. More important
that a user’s feed loads quickly than it updating
quickly.
• Use timestamp to “page” through data.
Lessons Learned
• Duplicate your data to achieve desired queries.
Storage is cheap. Writes are cheap.
• Think outside the box. Cassandra is not
relational.
• Never ever ever ignore inserts/deletes in favor of
an update only workflow. Never. It is literally
insane.
Final Specs
• 16 node cluster on AWS EC2 c3.8xlarge
• Mix of SizeTieredCompactionStrategy and
DateTieredCompactionStrategy
• NetworkTopologyStrategy
• Replication factor 3
• ConsistencyLevel = ONE for most requests
Final Specs
• Bursty write volume. Consistent read volume.
• 5k to 80k writes per second
• 2k to 4k reads per second
Questions?
I might have answers.
Thank you!
Mark Dunphy, Software Engineer
Behance/Adobe
@dunphtastic

Contenu connexe

Tendances

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMongoDB
 
Lightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning SpeedLightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning SpeedScyllaDB
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsNitish Upreti
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com confluent
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022HostedbyConfluent
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger InternalsNorberto Leite
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMydbops
 
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Jukka Zitting
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화NAVER D2
 
[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영NAVER D2
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리Junyi Song
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafkaconfluent
 

Tendances (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Lightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning SpeedLightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning Speed
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platforms
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 

En vedette

Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...MongoDB
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status FeedMongoDB
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedSocialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedMongoDB
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDBTony Tam
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2MongoDB
 
MongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic WebMongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic WebDATAVERSITY
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
 
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDBMongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDBMongoDB
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationMongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseChris Clarke
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDBAlex Sharp
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
MongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBookMongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBookMongoDB
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkMongoDB
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 

En vedette (20)

Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedSocialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
MongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic WebMongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic Web
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDBMongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBookMongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBook
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 

Similaire à Building an Activity Feed with Cassandra

All about that reactive ui
All about that reactive uiAll about that reactive ui
All about that reactive uiPaul van Zyl
 
Bridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality MapsBridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality MapsMalini Rao
 
Data Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichData Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichMemi Beltrame
 
Cracking web development
Cracking web developmentCracking web development
Cracking web developmentEyal Kenig
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
 
Getting your project off the ground (BuildStuffLt)
Getting your project off the ground (BuildStuffLt)Getting your project off the ground (BuildStuffLt)
Getting your project off the ground (BuildStuffLt)Johannes Brodwall
 
Microservices: Architecture and Practice
Microservices:  Architecture and PracticeMicroservices:  Architecture and Practice
Microservices: Architecture and PracticeIron.io
 
React state management with Redux and MobX
React state management with Redux and MobXReact state management with Redux and MobX
React state management with Redux and MobXDarko Kukovec
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineTrieu Nguyen
 
Full stack conference talk slides
Full stack conference talk slidesFull stack conference talk slides
Full stack conference talk slidesSameer Al-Sakran
 
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystemsSciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystemsJames Howison
 
Drupal 8 Initiatives
Drupal 8 InitiativesDrupal 8 Initiatives
Drupal 8 InitiativesAngela Byron
 
GDG Helwan Introduction to python
GDG Helwan Introduction to pythonGDG Helwan Introduction to python
GDG Helwan Introduction to pythonMohamed Hegazy
 
Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)NETUserGroupBern
 
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)Thinkful
 

Similaire à Building an Activity Feed with Cassandra (20)

All about that reactive ui
All about that reactive uiAll about that reactive ui
All about that reactive ui
 
2014 Picking a Platform by Anand Kulkarni
2014 Picking a Platform by Anand Kulkarni2014 Picking a Platform by Anand Kulkarni
2014 Picking a Platform by Anand Kulkarni
 
F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4
 
Bridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality MapsBridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality Maps
 
Data Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichData Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference Zurich
 
Cracking web development
Cracking web developmentCracking web development
Cracking web development
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
Getting your project off the ground (BuildStuffLt)
Getting your project off the ground (BuildStuffLt)Getting your project off the ground (BuildStuffLt)
Getting your project off the ground (BuildStuffLt)
 
Microservices: Architecture and Practice
Microservices:  Architecture and PracticeMicroservices:  Architecture and Practice
Microservices: Architecture and Practice
 
React state management with Redux and MobX
React state management with Redux and MobXReact state management with Redux and MobX
React state management with Redux and MobX
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
Full stack conference talk slides
Full stack conference talk slidesFull stack conference talk slides
Full stack conference talk slides
 
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystemsSciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
 
306 belmont ssp08agileit
306 belmont ssp08agileit306 belmont ssp08agileit
306 belmont ssp08agileit
 
redpill Forensics
redpill Forensicsredpill Forensics
redpill Forensics
 
Drupal 8 Initiatives
Drupal 8 InitiativesDrupal 8 Initiatives
Drupal 8 Initiatives
 
GDG Helwan Introduction to python
GDG Helwan Introduction to pythonGDG Helwan Introduction to python
GDG Helwan Introduction to python
 
Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)
 
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
 
Gaurav agarwal
Gaurav agarwalGaurav agarwal
Gaurav agarwal
 

Dernier

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Building an Activity Feed with Cassandra

  • 1. Building an Activity Feed with Cassandra Mark Dunphy, Software Engineer Behance/Adobe @dunphtastic
  • 2. Disclaimer Not an operations person. Will pretend to be one for the purpose of this talk.
  • 3. Quick Overview What is the Behance Activity Feed?
  • 4.
  • 5. • Actions • Comments, Appreciations, Etc • Entities • Projects, Works in Progress • Actors • Users
  • 8. User A publishes a new project Write to Follower A’s feed Write to Follower B’s feed Write to Follower C’s feed Write to Follower D’s feed
  • 11. • Smaller user base (~340,000). • Built very quickly. Worked well at the time. • Not well researched.
  • 13. • Frequent node failures • Heavy disk fragmentation caused by deletes • Slow reads from disk. Started storing in RAM. • Primary -> Secondary caused downtime for some. • Scaled out vertically and horizontally.
  • 15. • Riak • Very close. Community seemed lacking. • Redis • No native cluster. Too much maintenance. • Memcached/MySQL • Too much complex app logic.
  • 17. • Fantastic community. #cassandra on Twitter • Easy to read documentation • Linearly scalable. Easy to grow cluster. • Low maintenance overhead for ops team. • Handles time series data very well.
  • 19. • Cassandra Summit 2014 • Other team in Adobe • Long nights reading documentation
  • 21. • Ephemeral • “Source of truth” lives in a MySQL database • Okay with *some* data loss
  • 23. • User’s feed is comprised of entities with one set of actions • User’s feed only contains one of any given entity • An entity’s set of actions contains up to seven of the most recent actions taken by that user’s network
  • 25. Language Support • Most services on Behance are PHP • No official Datastax PHP driver
  • 26. –Mark Dunphy, 2014 “Looks like I’m learning python.”
  • 27. Go to Production No, nothing is working yet. I didn’t skip a slide.
  • 28. • App/cluster in production before anything works • Test real life load • Fail spectacularly without anybody noticing • Deploy risky changes without fear • Run alongside MongoDB
  • 30. Query Patterns • “Create your data models based on the queries you want to run” - Basically Everybody • Wanted to… • Read a user’s feed entities by type and time of most recent action…separately. • Write/Update a user’s feed entities with new actions while knowing only user id and entity id
  • 32. –Mark Dunphy, January 2015 “An UPDATE in Cassandra works like an UPSERT! Let’s store the user’s entire feed in a single row in a table! It’s so simple!” First Data Model
  • 33. CREATE TYPE activity.action ( created_on timestamp, secondary_entity_id int, actor_id int, verb_id int ); CREATE TYPE activity.entity ( entity_type_id int, entity_id int );
  • 34. CREATE TABLE activity.project_actions ( modified_on timestamp, entity_id int, user_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, entity_id) )
  • 35. CREATE TABLE activity.feeds ( modified_entities list<frozen<entity>>, modified_on timestamp, project_ids list<int>, user_id int, wip_revision_ids list<int>, PRIMARY KEY(user_id) )
  • 37. First Data Model Moments Before Everything Exploded
  • 38. –Mark Dunphy, January 2015 “Okay let’s keep nearly the same model, but use INSERT and DELETE instead of always UPDATE. Just use batch statements.” Second Data Model
  • 39. Second Data Model This was also a very very bad idea.
  • 40. • Lose the benefit of Cassandra being distributed • All queries go through the same coordinator which puts a lot of stress and responsibility on one node. • Use concurrency and prepared statements instead. Datastax drivers make this easy. Second Data Model
  • 45. CREATE TYPE activity.action ( created_on timestamp, secondary_entity_id int, actor_id int, verb_id int );
  • 46. CREATE TABLE activity.projects ( created_on timestamp, user_id int, entity_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, created_on, entity_id) )
  • 47. CREATE TABLE activity.project_actions ( modified_on timestamp, entity_id int, user_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, entity_id) )
  • 49. Write Strategy • “User A comments on Project A. User B follows User A.” • Request out to add the comment action to User B’s feed • Read existing actions for that entity (Project A) in B’s feed. Push new action on top. • Write new actions list into new “row” in projects table
  • 50. Read Strategy • SELECT * FROM projects WHERE user_id = 123 AND created_on > 123214373 • Optimized for quick/easy reads. More important that a user’s feed loads quickly than it updating quickly. • Use timestamp to “page” through data.
  • 51. Lessons Learned • Duplicate your data to achieve desired queries. Storage is cheap. Writes are cheap. • Think outside the box. Cassandra is not relational. • Never ever ever ignore inserts/deletes in favor of an update only workflow. Never. It is literally insane.
  • 52. Final Specs • 16 node cluster on AWS EC2 c3.8xlarge • Mix of SizeTieredCompactionStrategy and DateTieredCompactionStrategy • NetworkTopologyStrategy • Replication factor 3 • ConsistencyLevel = ONE for most requests
  • 53. Final Specs • Bursty write volume. Consistent read volume. • 5k to 80k writes per second • 2k to 4k reads per second
  • 55. Thank you! Mark Dunphy, Software Engineer Behance/Adobe @dunphtastic