3. Overview
• SimpleReach
Polyglottany Is Not A Sin Eric Lubow @elubow
4. Overview
• SimpleReach
• Definitions and Data Stores
Polyglottany Is Not A Sin Eric Lubow @elubow
5. Overview
• SimpleReach
• Definitions and Data Stores
• Evolution to Polyglottany
Polyglottany Is Not A Sin Eric Lubow @elubow
6. Overview
• SimpleReach
• Definitions and Data Stores
• Evolution to Polyglottany
• Tie It Together
Polyglottany Is Not A Sin Eric Lubow @elubow
7. Overview
• SimpleReach
• Definitions and Data Stores
• Evolution to Polyglottany
• Tie It Together
• Final Thoughts
Polyglottany Is Not A Sin Eric Lubow @elubow
8. Overview
• SimpleReach
• Definitions and Data Stores
• Evolution to Polyglottany
• Tie It Together
• Final Thoughts
• Questions
Polyglottany Is Not A Sin Eric Lubow @elubow
11. Size
• 150m events
recorded per day
and growing
Polyglottany Is Not A Sin Eric Lubow @elubow
12. Size
• 150m events
recorded per day
and growing
• 600m Pageviews per
month and growing
Polyglottany Is Not A Sin Eric Lubow @elubow
13. Polyglot Persistence
Polyglot Persistence, like polyglot programming, is all
about choosing the right persistence option for the task
at hand.
http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence
Polyglottany Is Not A Sin Eric Lubow @elubow
14. Right Tool For The Job
Polyglottany Is Not A Sin Eric Lubow @elubow
15. Decisions. Decisions.
• What are my query patterns? • Are my display requirements • Is the encryption/authentication/ • How fault tolerant is the system?
for realtime data? authorization support sufficient for
Tech
Is my data ingestion high volume/high my needs? What supporting tools do I need?
Data
• •
velocity? • Do I need to aggregate data
on the fly? • Are there monitoring architectures • Is there support for my language?
• Am I batch loading data? already built?
• Is my data structured or
• Am I write heavy or read heavy? unstructured? • Are there best practices guides
already
• Are data relationships important? • Does my data lend itself to a
specific design pattern? • Will the data need to be
distributed?
• Does my data need to be
immediately available everywhere?
Data Tech
Financial Other
• Am I cloud based?
Financial Other
• Do I have legal requirements (HIPAA/FIPS/Sarbanes Oxley/PII)?
• Am I hardware based?
• What kind of enterprise support is available?
• Am I a cloud/iron hybrid?
• What is the community like?
• How much am I willing to spend?
• Does the product roadmap pertain to my roadmap?
• How much am I willing to spend if something goes wrong?
Polyglottany Is Not A Sin Eric Lubow @elubow
16. No One Size Fits All
Polyglottany Is Not A Sin Eric Lubow @elubow
17. Tools
C*
Polyglottany Is Not A Sin Eric Lubow @elubow
25. Cassandra C*
Polyglottany Is Not A Sin Eric Lubow @elubow
26. Cassandra C*
• Large data volume ingestion at high velocity
Polyglottany Is Not A Sin Eric Lubow @elubow
27. Cassandra C*
• Large data volume ingestion at high velocity
• Really fast writes to many locations (eventual consistency)
Polyglottany Is Not A Sin Eric Lubow @elubow
28. Cassandra C*
• Large data volume ingestion at high velocity
• Really fast writes to many locations (eventual consistency)
• Query by column groups within rows (slicing)
Polyglottany Is Not A Sin Eric Lubow @elubow
29. Cassandra C*
• Large data volume ingestion at high velocity
• Really fast writes to many locations (eventual consistency)
• Query by column groups within rows (slicing)
• Opscenter
Polyglottany Is Not A Sin Eric Lubow @elubow
30. Cassandra C*
• Large data volume ingestion at high velocity
• Really fast writes to many locations (eventual consistency)
• Query by column groups within rows (slicing)
• Opscenter
• Data toolkit: more than a data storage layer
Polyglottany Is Not A Sin Eric Lubow @elubow
31. Cassandra C*
• Large data volume ingestion at high velocity
• Really fast writes to many locations (eventual consistency)
• Query by column groups within rows (slicing)
• Opscenter
• Data toolkit: more than a data storage layer
• TTLs for small group aggregation
Polyglottany Is Not A Sin Eric Lubow @elubow
32. Cassandra C*
• Large data volume ingestion at high velocity
• Really fast writes to many locations (eventual consistency)
• Query by column groups within rows (slicing)
• Opscenter
• Data toolkit: more than a data storage layer
• TTLs for small group aggregation
• Wrote Helenus, Node.js driver for Cassandra
Polyglottany Is Not A Sin Eric Lubow @elubow
34. MongoDB
• Fast atomic increments (Node.js is native JSON)
Polyglottany Is Not A Sin Eric Lubow @elubow
35. MongoDB
• Fast atomic increments (Node.js is native JSON)
• Sharding
Polyglottany Is Not A Sin Eric Lubow @elubow
36. MongoDB
• Fast atomic increments (Node.js is native JSON)
• Sharding
• Solid ORM for Rails (MongoID)
Polyglottany Is Not A Sin Eric Lubow @elubow
37. MongoDB
• Fast atomic increments (Node.js is native JSON)
• Sharding
• Solid ORM for Rails (MongoID)
• Fast access for pub/sub of durable/persisted documents
Polyglottany Is Not A Sin Eric Lubow @elubow
38. MongoDB
• Fast atomic increments (Node.js is native JSON)
• Sharding
• Solid ORM for Rails (MongoID)
• Fast access for pub/sub of durable/persisted documents
• B-Tree Indexes
Polyglottany Is Not A Sin Eric Lubow @elubow
39. MongoDB
• Fast atomic increments (Node.js is native JSON)
• Sharding
• Solid ORM for Rails (MongoID)
• Fast access for pub/sub of durable/persisted documents
• B-Tree Indexes
• Document based via JSON
Polyglottany Is Not A Sin Eric Lubow @elubow
40. MongoDB
• Fast atomic increments (Node.js is native JSON)
• Sharding
• Solid ORM for Rails (MongoID)
• Fast access for pub/sub of durable/persisted documents
• B-Tree Indexes
• Document based via JSON
• TTLs for ephemeral data
Polyglottany Is Not A Sin Eric Lubow @elubow
42. Redis
• Supports hundreds of thousands transactions per second
Polyglottany Is Not A Sin Eric Lubow @elubow
43. Redis
• Supports hundreds of thousands transactions per second
• Great caching engine
Polyglottany Is Not A Sin Eric Lubow @elubow
44. Redis
• Supports hundreds of thousands transactions per second
• Great caching engine
• Supports useful variable types like sets, sorted set, lists
Polyglottany Is Not A Sin Eric Lubow @elubow
45. Redis
• Supports hundreds of thousands transactions per second
• Great caching engine
• Supports useful variable types like sets, sorted set, lists
• Everything is guaranteed to Memory Mapped (mmap)
Polyglottany Is Not A Sin Eric Lubow @elubow
46. Redis
• Supports hundreds of thousands transactions per second
• Great caching engine
• Supports useful variable types like sets, sorted set, lists
• Everything is guaranteed to Memory Mapped (mmap)
• Transactional and supports bulk operations
Polyglottany Is Not A Sin Eric Lubow @elubow
47. Redis
• Supports hundreds of thousands transactions per second
• Great caching engine
• Supports useful variable types like sets, sorted set, lists
• Everything is guaranteed to Memory Mapped (mmap)
• Transactional and supports bulk operations
• Centralized queueing and locking system
Polyglottany Is Not A Sin Eric Lubow @elubow
49. Infobright
• Works with standard MySQL driver
Polyglottany Is Not A Sin Eric Lubow @elubow
50. Infobright
• Works with standard MySQL driver
• Column Stores for ad-hoc analytics queries in SQL
Polyglottany Is Not A Sin Eric Lubow @elubow
51. Infobright
• Works with standard MySQL driver
• Column Stores for ad-hoc analytics queries in SQL
• Databases built for business intelligence
Polyglottany Is Not A Sin Eric Lubow @elubow
52. Infobright
• Works with standard MySQL driver
• Column Stores for ad-hoc analytics queries in SQL
• Databases built for business intelligence
• Heavy compression of data
Polyglottany Is Not A Sin Eric Lubow @elubow
53. Infobright
• Works with standard MySQL driver
• Column Stores for ad-hoc analytics queries in SQL
• Databases built for business intelligence
• Heavy compression of data
• Pre-aggregated data (Knowledge Grid)
Polyglottany Is Not A Sin Eric Lubow @elubow
55. Ruby, Node.js, Python
• Polyglottany doesn’t only apply to data stores
Polyglottany Is Not A Sin Eric Lubow @elubow
56. Ruby, Node.js, Python
• Polyglottany doesn’t only apply to data stores
• Each language has its own benefit to each data storage layer
Polyglottany Is Not A Sin Eric Lubow @elubow
57. Ruby, Node.js, Python
• Polyglottany doesn’t only apply to data stores
• Each language has its own benefit to each data storage layer
• Each language has its own individual benefits
Polyglottany Is Not A Sin Eric Lubow @elubow
58. Ruby, Node.js, Python
• Polyglottany doesn’t only apply to data stores
• Each language has its own benefit to each data storage layer
• Each language has its own individual benefits
• JSON, APIs, Performance
Polyglottany Is Not A Sin Eric Lubow @elubow
61. Cons
• Redis - Can only utilize a single core. SerDe price.
Polyglottany Is Not A Sin Eric Lubow @elubow
62. Cons
• Redis - Can only utilize a single core. SerDe price.
• MySQL Column Store - DELETE/UPDATEs are VERY expensive
Polyglottany Is Not A Sin Eric Lubow @elubow
63. Cons
• Redis - Can only utilize a single core. SerDe price.
• MySQL Column Store - DELETE/UPDATEs are VERY expensive
• Cassandra - No btree indexes
Polyglottany Is Not A Sin Eric Lubow @elubow
64. Cons
• Redis - Can only utilize a single core. SerDe price.
• MySQL Column Store - DELETE/UPDATEs are VERY expensive
• Cassandra - No btree indexes
• Mongo - Indexes must fit in memory. Forced Replica ping times
Polyglottany Is Not A Sin Eric Lubow @elubow
65. Cons
• Redis - Can only utilize a single core. SerDe price.
• MySQL Column Store - DELETE/UPDATEs are VERY expensive
• Cassandra - No btree indexes
• Mongo - Indexes must fit in memory. Forced Replica ping times
• Python - Whitespace. Community
Polyglottany Is Not A Sin Eric Lubow @elubow
66. Cons
• Redis - Can only utilize a single core. SerDe price.
• MySQL Column Store - DELETE/UPDATEs are VERY expensive
• Cassandra - No btree indexes
• Mongo - Indexes must fit in memory. Forced Replica ping times
• Python - Whitespace. Community
• Ruby - Not high performance enough for our standards
Polyglottany Is Not A Sin Eric Lubow @elubow
67. Cons
• Redis - Can only utilize a single core. SerDe price.
• MySQL Column Store - DELETE/UPDATEs are VERY expensive
• Cassandra - No btree indexes
• Mongo - Indexes must fit in memory. Forced Replica ping times
• Python - Whitespace. Community
• Ruby - Not high performance enough for our standards
• Javascript (Node.js) - Bad for CPU or IO intensive workloads
Polyglottany Is Not A Sin Eric Lubow @elubow
68. Tying It Together
Even with the right tools, 80% of the work of building a
big data system is acquiring and refining the raw data into
usable data.
Polyglottany Is Not A Sin Eric Lubow @elubow
71. Tying It Together
• Service Oriented Architecture (Internal API)
Polyglottany Is Not A Sin Eric Lubow @elubow
72. Tying It Together
• Service Oriented Architecture (Internal API)
• Data accuracy checks: visual and programmatic
Polyglottany Is Not A Sin Eric Lubow @elubow
73. Tying It Together
• Service Oriented Architecture (Internal API)
• Data accuracy checks: visual and programmatic
• Built framework for testing out storage engines
Polyglottany Is Not A Sin Eric Lubow @elubow
74. Tying It Together
• Service Oriented Architecture (Internal API)
• Data accuracy checks: visual and programmatic
• Built framework for testing out storage engines
• Access to many toolsets (for all languages and DBs)
Polyglottany Is Not A Sin Eric Lubow @elubow
78. Points To Consider
• Data consistency - Same in all data stores
Polyglottany Is Not A Sin Eric Lubow @elubow
79. Points To Consider
• Data consistency - Same in all data stores
• How important is data durability?
Polyglottany Is Not A Sin Eric Lubow @elubow
80. Points To Consider
• Data consistency - Same in all data stores
• How important is data durability?
• Managing many servers (Chef, AWS, CSSH)
Polyglottany Is Not A Sin Eric Lubow @elubow
81. Points To Consider
• Data consistency - Same in all data stores
• How important is data durability?
• Managing many servers (Chef, AWS, CSSH)
• Managing and learning many different applications and
tuning for them
Polyglottany Is Not A Sin Eric Lubow @elubow
82. Points To Consider
• Data consistency - Same in all data stores
• How important is data durability?
• Managing many servers (Chef, AWS, CSSH)
• Managing and learning many different applications and
tuning for them
• Expertise
Polyglottany Is Not A Sin Eric Lubow @elubow
84. Expertise
• What happens when you need help?
Polyglottany Is Not A Sin Eric Lubow @elubow
85. Expertise
• What happens when you need help?
• How do you become experts?
Polyglottany Is Not A Sin Eric Lubow @elubow
86. Expertise
• What happens when you need help?
• How do you become experts?
• What happens when you need more experts?
Polyglottany Is Not A Sin Eric Lubow @elubow
88. Summary
• Polyglottany is not a sin
Polyglottany Is Not A Sin Eric Lubow @elubow
89. Summary
• Polyglottany is not a sin
• Know your data read/write patterns
Polyglottany Is Not A Sin Eric Lubow @elubow
90. Summary
• Polyglottany is not a sin
• Know your data read/write patterns
• Know the tools available to you
Polyglottany Is Not A Sin Eric Lubow @elubow
91. Summary
• Polyglottany is not a sin
• Know your data read/write patterns
• Know the tools available to you
• Know your compromises
Polyglottany Is Not A Sin Eric Lubow @elubow
92. Summary
• Polyglottany is not a sin
• Know your data read/write patterns
• Know the tools available to you
• Know your compromises
• Expertise
Polyglottany Is Not A Sin Eric Lubow @elubow
94. Questions are guaranteed in life.
Answers aren’t.
Eric Lubow
@elubow
elubow@simplereach.com
#MongoBoston
Thank you.
Notes de l'éditeur
\n
\n
\n
\n
\n
\n
\n
SimpleReach is a social intelligence tool for content creators. We track everything social action, on every major network, across the entire web in real-time. That means every like, tweet, pin, stumble and many more.\n