4. Why Big Data Matters
Research done by McKinsey & Company shows the eye-opening, 10-year
category growth rate differences between businesses that smartly use their big
data and those that do not.
6. Some users
✤ Financial
✤ Social Media
✤ Advertising
✤ Entertainment
✤ Energy
✤ E-tail
✤ Health care
✤ Government
7. Common use cases
✤ Time series data
✤ Messaging
✤ Ad tracking
✤ Data mining
✤ User activity streams
✤ User sessions
✤ Anything requiring:
Scalable + performant + highly available
14. “With Cassandra, we get better business agility, and we
don’t have to plan capacity in advance, we don’t need to
ask permission of other people to build things for us,
and we don’t worry about running out of space or
power.”
Adrian Cockcroft, Cloud Architect
15. Netflix on Cassandra
✤ Could not build datacenters fast enough
✤ Made decision to go to cloud (AWS)
✤ Applications include Netflix’s subscriber system, AB
testing, and viewing history service
✤ Over a year in, Netflix finds Cassandra to be
✤ Fast
✤ Cost-effective
✤ Scalable
✤ Flexible
✤ Reliable: no SPOF
16. “Without Cassandra, our engineers would’ve had to
create something that could scale to our needs, that
would’ve prevented us from focusing on building
product and solving problems for Backupify’s users,
which are far more important tasks.”
Matt Conway, VP Engineering
17. Backupify on Cassandra
✤ Cloud-based utility that enables businesses and
consumers to backup, search and restore the content of
popular online applications such as Google Apps,
Gmail, Facebook, Twitter, and Blogger
✤ Cassandra findings:
✤ Solved scaling, allowing engineers to focus on their business
✤ DataStax OpsCenter made it easy to monitor the health and
performance of their cluster
✤ Reliable, redundant and scalable data storage helped
eliminate down-time
✤ Ability to offer both backup and storage, but also analysis
18. “You can seamlessly add new nodes and expand your
total capacity without deteriorating the performance of
the data store. Cassandra has allowed us to scale very
effectively.”
Harry Robertson, Tech Lead
19. Ooyala on Cassandra
✤ Ooyala provides a suite of technologies and services that
support content owners in managing, analyzing and
monetizing the digital video they publish online
✤ Cassandra findings:
✤ Classic “Big Data” problem did not require re-architecting
✤ Delivered ability to respond to increasingly sophisticated
analytic needs of customers
✤ Developers spend time building application features, not
figuring out how to scale
20. “Cassandra has allowed us to build bigger features
faster and more reliably, while using less money and
without needing to expand our staff.”
Kyle Ambroff, Sr. Engineer
21. Formspring on Cassandra
✤ Users of Formspring engage with and learn more about
each other by asking and responding to questions. Close
to 4B responses in the system and 30M unique users
✤ Cassandra experience
✤ No sharding needed – just add nodes to scale
✤ Performance – the popular users with many followers saw no
speed reduction. No more memcached!
✤ Flexibility of a schema-optional architecture is very developer
friendly
30. Operations
✤ “Vanilla” Hadoop
✤ 8+ services to setup, monitor, backup, and recover
(NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker,
Zookeeper, Region Server,...)
✤ Single points of failure
✤ Can't separate online and offline processing
✤ DataStax Enterprise
✤ Single, simplified component
✤ Self-organizes based on workload
✤ Peer to peer
✤ JobTracker failover
31. Managing & Monitoring Big Data
✤ DataStax OpsCenter
manages and
monitors all
Cassandra and
Hadoop operations