More Related Content Similar to Designing Resilient Application Platforms with Apache Cassandra - Hayato Shimizu (DataStax) (20) More from jaxLondonConference (18) Designing Resilient Application Platforms with Apache Cassandra - Hayato Shimizu (DataStax)1. Building Highly Available Services
Using Cassandra
USE jax_london;!
SELECT * FROM presenters WHERE name = ‘Hayato Shimizu’;!
name
| title
| company | area!
----------------+---------------------+----------+------!
Hayato Shimizu | Solutions Architect | DataStax | EMEA!
2. Apache Cassandra
•
•
•
•
•
•
•
•
Created by Avinash Lakshman and Prashant Malik at Facebook
Amazon Dynamo + Google BigTable
Highly distributed database with data replication for redundancy
Active-Active Multi DC, master-less design – no single point of failure
High throughput!
Linearly scalable – volume, throughput
Used by many mission critical applications and services
2.0 is out!
©2013 DataStax Confidential. Do not distribute without consent.
2
3. C* Architecture – Data Replication
• Token Range 0 -> 2127-1 in Ring Formation
• Consistent Hashing Algorithm
• Replica nodes in clockwise
©2013 DataStax Confidential. Do not distribute without consent.
3
4. C* Architecture - No Single Point of Failure
• Client Load Balances
• Do not use a hardware LB
©2013 DataStax Confidential. Do not distribute without consent.
4
5. C* Architecture - Multi DC Replication
©2013 DataStax Confidential. Do not distribute without consent.
5
6. C* Architecture – Data Consistency
• C* offers TUNABLE consistency
• Client decides consistency per query
• ANY, ONE, TWO, THREE, QUORUM, LOCAL_QUORUM,
EACH_QUORUM, ALL
• QUORUM = (replication_factor / 2 ) + 1
• Replication Factor = 3 can maintain Quorum with tolerance of 1 node
failure
©2013 DataStax Confidential. Do not distribute without consent.
6
7. Setting Up Cassandra for Multi DC
On each node – edit the following file:
conf/cassandra-rackdc.properties
With the following entry:
dc=DC1
rack=RACK1
On each node – edit the following file:
conf/cassandra.yaml
With the following entry:
endpoint_snitch: GossipingPropertyFileSnitch
Create keyspace:
CREATE KEYSPACE new_keyspace WITH
replication = {'class': 'NetworkTopologyStrategy', 'DC1' : 3, 'DC2' : 3};
©2013 DataStax Confidential. Do not distribute without consent.
7
8. C* Architecture – Data Centre Configuration
Data 1
Data 3
Data 4
Data 2
Data 3
Data 4
©2013 DataStax Confidential. Do not distribute without consent.
Data 1
Data 2
Data 4
Data 1
Data 2
Data 3
8
9. Cassandra Architecture - Writes
INSERT INTO…
memtable
Commit log
SSTable
©2013 DataStax Confidential. Do not distribute without consent.
9
12. Single Data Centre
• Resiliency through C* Data Replication
12
©2013 DataStax Confidential. Do not distribute without consent.
13. Multi DC – Active/Passive
•
•
•
•
•
•
•
•
Wasteful
Do you test this? Does it actually work when it fails over?
What is the decision point for failing over?
Do you try and fix your problem in the active DC?
Is it a manual process?
How long does it take to failover to passive DC?
How many people and which departments will need to be involved?
Incident managers?
©2013 DataStax Confidential. Do not distribute without consent.
13
14. Active-Everywhere is the Norm
Cloud
Datacenter
Source: (http://www.datastax.com/resources/whitepapers/bigdata)
©2013 DataStax Confidential. Do not distribute without consent.
14
15. Design Considerations - Active-Everywhere DC Strategies
• 24 x 7 services are what businesses and consumers now expect
• Service failure costs money and reputational damage
• 99.999+% service up time?
• Data Replication Strategies
• Consistent data replication across all DCs
• Eventually consistent replication across DCs
©2013 DataStax Confidential. Do not distribute without consent.
15
16. Design Considerations - Data Replication Strategies
• Latency is not going away – embrace it
• Possible Solutions
• Sharded users
• Full data consistency in all DCs
• Eventually consistency to other DCs
©2013 DataStax Confidential. Do not distribute without consent.
16
17. Design Considerations - Full Consistency Across All Data Centres
•
•
•
•
Does your service really require this?
Performance considerations
Think about your service usage patterns
Failure scenarios
• WAN Link failure
©2013 DataStax Confidential. Do not distribute without consent.
17
18. Design Considerations - Eventual Consistency Across DCs
• Identify data access patterns for each service
• Data access patterns
• Write-Only
• Read-Only
• Mixture of both
• Access frequency
©2013 DataStax Confidential. Do not distribute without consent.
18
19. Design Considerations - Failure Scenarios
•
•
•
•
•
•
•
Data centre total failure – natural disaster, power, etc
Network storm
Network kit firmware upgrade failure
SAN Upgrades – wrong Fibre Channel cable pulled out
WAN link failure
Service dependency failure
Etc, etc
• Failure probabilities - do your maths!
©2013 DataStax Confidential. Do not distribute without consent.
19
20. User Session Persistence to One DC
Session 1
Session 2
DC1
DC2
Service
C*
©2013 DataStax Confidential. Do not distribute without consent.
Service
Async
Replication
20
C*
21. DC Session Persistence Technique 1
• GTM – Global Traffic Management
• DNS based solution
• Hardware / SaaS solutions
• Traffic weighting for each DC
• Persistence guaranteed in private network using hardware
• Internet facing slightly more difficult – DNS RFC spec
©2013 DataStax Confidential. Do not distribute without consent.
21
22. DC Session Persistence Technique 2
• A famous company providing edge based load balancing
• Users connect via their service
• Cookie / query string based
Edge Load Balancer
https
async
DC1
©2013 DataStax Confidential. Do not distribute without consent.
https
22
DC2
24. Application Tier Resilience
•
•
•
•
Make it fault tolerant – stateless.
Make it horizontally scalable
Load balancer stickiness – really?
Use C* to store sessions - sessions will recover in a DR scenario.
App Tier
Session1
App Tier
App Tier
Session1
App Tier
Session1
Cassandra Replication
Session1
©2013 DataStax Confidential. Do not distribute without consent.
24
25. Seamless Application Releases & System Maintenance
• 99.999+% SLA includes maintenance!
• C* rolling upgrades
• Kernel patching etc
• Schema Changes – C* will help
• Code should now handle the data structure versions
• Code deployment - statelessness will help here again!
©2013 DataStax Confidential. Do not distribute without consent.
25
27. Embracing the Cloud
•
•
•
•
High demand can kill your service – make it scalable
Bursting into the cloud for peak load
Flexible provisioning model
DR on the cheap
©2013 DataStax Confidential. Do not distribute without consent.
27
29. Conclusion
• Developers - think about your infrastructure. Don’t just leave it to the Ops
or DevOps teams.
• Ops / DevOps Engineers – think about the application and learn how they
work.
• Collaborate with each other.
• Building out resilient infrastructure is not that hard. Just requires some
thoughts, communications, and execution.
• Think about scale.
• Keep IT Simple
• Use great tools like Cassandra
©2013 DataStax Confidential. Do not distribute without consent.
29