In order to meet all our data needs including high volume ingestion, Map Reduce capabilities, real-time analytics, historical analytics, and other analysis technologies, we needed to incorporate the use of Redis, Mongo, a MySQL column store and Cassandra. Wrap the whole thing up in a Node.js API for speed and consistent access patterns and you have a whole data storage spread.
Talk URL: http://www.youtube.com/watch?v=od6DdB-zJCk
3. Overview
• SimpleReach
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
4. Overview
• SimpleReach
• Definitions and Data Stores
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
5. Overview
• SimpleReach
• Definitions and Data Stores
• Evolution to Polyglottany
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
6. Overview
• SimpleReach
• Definitions and Data Stores
• Evolution to Polyglottany
• Tie It Together
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
7. Overview
• SimpleReach
• Definitions and Data Stores
• Evolution to Polyglottany
• Tie It Together
• Questions
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
10. Size
• 100m events
recorded per day and
growing
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
11. Size
• 100m events
recorded per day and
growing
• 500m Pageviews per
month and growing
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
12. Polyglot Persistence
Polyglot Persistence, like polyglot programming, is all
about choosing the right persistence option for the task
at hand.
http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
13. Right Tool For The Job
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
15. Why?
• Heavier READ loads vs heavier write loads
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
16. Why?
• Heavier READ loads vs heavier write loads
• Data relationships may be less important
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
17. Why?
• Heavier READ loads vs heavier write loads
• Data relationships may be less important
• Different aspects of a system have different requirements
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
18. No One Size Fits All
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
27. Cassandra
• Large data volume ingestion
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
28. Cassandra
• Large data volume ingestion
• Really fast writes to many locations (eventual consistency)
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
29. Cassandra
• Large data volume ingestion
• Really fast writes to many locations (eventual consistency)
• Query by column groups within rows
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
30. Cassandra
• Large data volume ingestion
• Really fast writes to many locations (eventual consistency)
• Query by column groups within rows
• Range queries in Hive (partial CF scans)
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
32. mongoDB
• Fast atomic increments (Node.js is native JSON)
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
33. mongoDB
• Fast atomic increments (Node.js is native JSON)
• Sharding for faster distributed increments
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
34. mongoDB
• Fast atomic increments (Node.js is native JSON)
• Sharding for faster distributed increments
• Solid ORM for Rails (MongoID)
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
35. mongoDB
• Fast atomic increments (Node.js is native JSON)
• Sharding for faster distributed increments
• Solid ORM for Rails (MongoID)
• Fast access for pub/sub of durable/persisted documents
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
37. Redis
• Supports hundreds of thousands transactions per
second
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
38. Redis
• Supports hundreds of thousands transactions per
second
• Great caching engine
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
39. Redis
• Supports hundreds of thousands transactions per
second
• Great caching engine
• Supports useful variable types like sorted set
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
40. Redis
• Supports hundreds of thousands transactions per
second
• Great caching engine
• Supports useful variable types like sorted set
• Pay SerDe price on each access
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
42. InfiniDB and Infobright
• Column Stores for ad-hoc analytics queries in SQL
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
43. InfiniDB and Infobright
• Column Stores for ad-hoc analytics queries in SQL
• Databases built for business intelligence
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
44. InfiniDB and Infobright
• Column Stores for ad-hoc analytics queries in SQL
• Databases built for business intelligence
• Heavy compression of data
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
45. InfiniDB and Infobright
• Column Stores for ad-hoc analytics queries in SQL
• Databases built for business intelligence
• Heavy compression of data
• Pre-aggregated data (Extents/Knowledge Grid)
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
47. Ruby, Node.js, Python
• Polyglottany doesn’t only apply to data stores
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
48. Ruby, Node.js, Python
• Polyglottany doesn’t only apply to data stores
• Each language has its own benefit to each data storage layer
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
49. Ruby, Node.js, Python
• Polyglottany doesn’t only apply to data stores
• Each language has its own benefit to each data storage layer
• Each language has its own individual benefits
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
50. Ruby, Node.js, Python
• Polyglottany doesn’t only apply to data stores
• Each language has its own benefit to each data storage layer
• Each language has its own individual benefits
• JSON, APIs, Performance
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
53. Cons
• Redis - Can only utilize a single core
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
54. Cons
• Redis - Can only utilize a single core
• MySQL Column Store - DELETE/UPDATEs are VERY expensive
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
55. Cons
• Redis - Can only utilize a single core
• MySQL Column Store - DELETE/UPDATEs are VERY expensive
• Cassandra - No btree indexes
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
56. Cons
• Redis - Can only utilize a single core
• MySQL Column Store - DELETE/UPDATEs are VERY expensive
• Cassandra - No btree indexes
• Mongo - Queries slow down when shard count increases. Indexes must fit in memory
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
57. Cons
• Redis - Can only utilize a single core
• MySQL Column Store - DELETE/UPDATEs are VERY expensive
• Cassandra - No btree indexes
• Mongo - Queries slow down when shard count increases. Indexes must fit in memory
• Python - Whitespace. Community
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
58. Cons
• Redis - Can only utilize a single core
• MySQL Column Store - DELETE/UPDATEs are VERY expensive
• Cassandra - No btree indexes
• Mongo - Queries slow down when shard count increases. Indexes must fit in memory
• Python - Whitespace. Community
• Ruby - Not high performance enough for our standards
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
59. Cons
• Redis - Can only utilize a single core
• MySQL Column Store - DELETE/UPDATEs are VERY expensive
• Cassandra - No btree indexes
• Mongo - Queries slow down when shard count increases. Indexes must fit in memory
• Python - Whitespace. Community
• Ruby - Not high performance enough for our standards
• Javascript (Node.js) - Bad for CPU or IO intensive workloads
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
61. Tying It Together
• Built in the cloud
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
62. Tying It Together
• Built in the cloud
• Service Oriented Architecture (Internal API)
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
63. Tying It Together
• Built in the cloud
• Service Oriented Architecture (Internal API)
• Built Helenus (Cassandra Node.js driver)
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
64. Tying It Together
• Built in the cloud
• Service Oriented Architecture (Internal API)
• Built Helenus (Cassandra Node.js driver)
• Data accuracy checks: visual and programmatic
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
65. Tying It Together
• Built in the cloud
• Service Oriented Architecture (Internal API)
• Built Helenus (Cassandra Node.js driver)
• Data accuracy checks: visual and programmatic
• Built framework for testing out storage engines
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
66. Service Architecture
Analytics
Real-time
Internal API
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
68. Helenus
• Built Node.js driver for Cassandra
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
69. Helenus
• Built Node.js driver for Cassandra
• https://github.com/simplereach/helenus
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
70. Helenus
• Built Node.js driver for Cassandra
• https://github.com/simplereach/helenus
• CQL 2/3, Composite Column, Thrift Interface
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
71. Helenus
• Built Node.js driver for Cassandra
• https://github.com/simplereach/helenus
• CQL 2/3, Composite Column, Thrift Interface
• More about Node.js and Cassandra
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
73. Points To Consider
• Data consistency - Same in all data stores
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
74. Points To Consider
• Data consistency - Same in all data stores
• How important is data durability?
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
75. Points To Consider
• Data consistency - Same in all data stores
• How important is data durability?
• Managing many servers (Chef, AWS, CSSH)
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
76. Points To Consider
• Data consistency - Same in all data stores
• How important is data durability?
• Managing many servers (Chef, AWS, CSSH)
• Managing and learning many different applications and
tuning for them
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
78. Summary
• Polyglottany is not a sin
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
79. Summary
• Polyglottany is not a sin
• Know your data read/write patterns
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
80. Summary
• Polyglottany is not a sin
• Know your data read/write patterns
• Know the tools available to you
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
81. Summary
• Polyglottany is not a sin
• Know your data read/write patterns
• Know the tools available to you
• Know your compromises
1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
83. Questions are guaranteed in life.
Answers aren’t.
Eric Lubow
@elubow
elubow@simplereach.com
#cassandra12
Thank you.
Notes de l'éditeur
\n
\n
\n
\n
\n
\n
SimpleReach is a social intelligence tool for content creators. We track everything social action, on every major network, across the entire web in real-time. That means every like, tweet, pin, stumble and many more.\n