This document discusses real-time data driven services in the cloud. It provides examples of business needs like adapting quickly to customer demands and scaling IT investments without risk. NoSQL databases like Apache Cassandra are presented as solutions to handle big data workloads across data centers with continuous availability and scalability. Cassandra's architecture and performance benefits are summarized. DataStax Enterprise is highlighted as a production-certified platform with analytics, search and multi-datacenter capabilities.
3. Line of Business Manager: Adapt With Customers
“I have to move as fast as my
market. I can‟t get slowed down
by people telling me this is
going to take six months. It‟s got
to be ready, quickly. No matter
what. And I need to adapt
quickly with my customers.
4. VP of IT: How Can I Scale Without Surprises?
“Given the explosion of data in the
enterprise, how can I scale my IT
investment to meet the demands of
my lines of business, without taking
on undue risk? (My choices are to
spend $10 million to scale what I‟ve
got versus do something new)”
5. Nearly All Businesses Must Think Global
Datacenter
Cloud
About 1/2
OF ALL SALES
will be online
BY THE END
OF 2013
Source: (http://www.datastax.com/resources/whitepapers/bigdata)
24/7
monitoring
demands
Global
market
demands
Localization
deployment
6. Your Data Demands Can Change in an Instant
2012201120102009
Fluctuating
traffic
demands
14
24 25
13
Fi
5
24
8. DataStax in the News
Big movies, big data: Netflix
embraces NoSQL in the cloud
With billions of reads and writes daily,
Netflix relies on NoSQL database
Cassandra to replace a legacy Oracle
deployment
May 02, 2013
(AP) The company chose Cassandra from
DataStax for its flexibility to create and
manage data clusters quickly, particularly in
the cloud. Christos Kalantzis, Netflix's
manager of cloud and platform engineering,
explains that "solutions like Oracle don't run
very well on virtualized hardware ... the
architecture of Cassandra and the availability
and consistency tuning and scalability made it
9. Major Changes: The Evolving Data Center
LOB
App
Oracl
e
LOB
App
MySQL
LOB
App
SQL
Serve
r
“What’s Happening?”
Hyper Velocity
Transactional
NoSQL
Data Warehouse
Teradata/
Exadata
“What Happened?”
Massive Volume
Bit Bucket
Hadoop
12. What is Apache Cassandra?
Apache Cassandra™ is a massively scalable distributed open
source database.
Cassandra is designed to handle big data workloads across
multiple data centres with no single point of failure, providing
enterprises with continuous availability without compromising
performance.
13. Cassandra Architecture Overview
• Fast / Linear performance
• Elastic scalability
• No single point of failure
• Enterprise / multi-data center / cloud
data distribution
• Location independence – read and
write anywhere
• Tunable data consistency (per
operation)
• Familiar SQL-Like language – CQL
• Dynamic / Flexible schema
• Can store structured, semi-
structured and unstructured data
• Replication Strategies from Amazon Dynamo paper
• Data structure and storage design from Google BigTable
paper
14. Apache Cassandra Leading in Performance
“In terms of scalability, there is a clear winner
throughout our experiments. Cassandra
achieves the highest throughput for the
maximum number of nodes in all experiments
with a linear increasing throughput.”
Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable,
et al., August 2013, p. 10. Benchmark paper presented at the Very Large Database Conference,
2013. http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2013.pdf
http://techblog.netflix.com/2011/11/benchmarking-
cassandra-scalability-on.html
Netflix Cloud Benchmark…
End Point Independent NoSQL Benchmark
Highest in throughput…
Lowest in latency…
16. Why We Exist
“I can create a Cassandra cluster
in any region of the world in 10
minutes. When marketing guys
decide we want to move into a
certain part of the world, we‟re
ready.”
Today‟s applications must be
always available and lightning
fast as they scale to previously
unimaginable levels.
Cassandra delivers both with a
beautifully simple and elegant
architecture.
17. What We Do Best
Cassandra was designed to do
things that are impossible in
other databases when it comes
to availability and
performance. Forget about
losing a machine here or there --
Cassandra delivers a world
where you can lose an entire
datacenter and still perform as
your customers expect.
“We have to be ready for disaster
recovery all the time. It‟s really
great that Cassandra allows for
active-active multiple data centers
where we can read and write
anywhere”
Jay Patel
Technical Architect at eBay
(Describing why they switched from legacy
relational architecture)
18. Without Breaking Your Budget
“To do what we need to
do today without
Cassandra would cost a
couple million dollars
more and would be
significantly harder to
manage operationally.”
19. DataStax: An Overview
• Founded in April 2010
• Home to Apache Cassandra Chair & most committers
• DataStax Enterprise – „Certified for Production‟ Big Data platform
• 300+ customers
• 100+ employees
• Headquartered in San Francisco Bay area
• European HQ in London, UK
• Funded by prominent venture firms
21. What Innovation?
• Production-certified
Cassandra
• Round-the-clock support
by the world‟s experts
• Your big data system is
easy to manage
• Satisfy your top security
officer
• Search and analyze your
hot data in context
22. Ask Different Things of Your Hot Data
Analyze
(Hadoop) Write
Read
Write Search
(Solr)
Search
(Solr)
Write
Read
DataStax
Enterprise
Multi-Data
Center
23. With the Security You Need
Analyze
(Hadoop) Write
Read
Write Search
(Solr)
Search
(Solr)
Write
Read
24. Into the Mainstream
“Security is very important to us,
so we‟re naturally very pleased
to see all the new security
features in DataStax Enterprise
3. Its scalability and
performance are enabling us to
develop an exciting financial
data analytics platform that will
create a better experience for
our audience.”
25. Managed From a Single Pane
Provision
Monitor
Plan
Optimize
Recover
26. CALL FOR PAPERS
SPONSORSHIP 30+ Sessions
TWO DAYSTRAINING DAY
Cassandra Summit Europe 2013
CALL FOR PAPERS
SPONSORSHIP OPPORTUNITY
TWO DAYS
30+ SESSIONS
TRAINING DAY
London Barbican 2013
Editor's Notes
Always available,Target new regions quickly, Need elasticity as traffic ebbs and flows
Always available,Target new regions quickly, Need elasticity as traffic ebbs and flows
In the old days all your structured data storage requirements went into the classic RDBMS like DB2, Oracle, Sybase, InformixRelational databases were great! You could define data structures and insert, join them to your hearts content and query them to give you the bits of data you want. After a while you joins become a massive chain and your systems start slowing down! Change the data structure to perform better? Too risky!Noooo, just buy more kit to make it perform!Then there is the multi data centre configuration. What did we used to do in the past? SAN replication? Oracle replication? There was a lot of operational headaches there.Anyway, while Larry was busy eating his hot dog, TCP/IP became more popular, ISP’s started appearing, Mosaic Browsers started appearing on people’s home PC’s, the whole internet revolution happened! So clever guys such as Larry and Sergei, and brave guys like Jeff Bezos were busy building global scale services. They hired a whole bunch of really clever engineers and gave them the global scale problems to solve. How many of you guys here who had a boss saying, I’ve got an e-commerce website, write a database engine that scales globally? That was the sort of bold move these have made back then. And what’s good about them is that rather than keeping all of this their trade secret, they started publishing papers on their implementations.Then this guy came along, stole Amazon Dynamo DB engineer AvinashLakshman, and built Cassandra, and eventually had it open sourced.Next slide