Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Andrey Zaychikov, Solutions Architect, EMEA
21.02.2017
Best Practices for NoSQL
Workloads on Amazon EC2
and Amazon EBS
Typical algorithm of choosing right options for
NoSQL DB deployments
What we will cover today?
How these databases differs?
DynamoDB
Cloud-based Self-managed (EC2)
Key-value Document-oriented
Graph
Cassandra
What is it?
• Dynamo model database +
CQL
• Horizontally scalable
• No single point of failure
• Data is immutable and sto...
Main concerns of the customers
Schema & usage
pattern
Geo distribution Background
routines & specific
optimizations
How does it work?
Choosing instance & storage
capacity: 80% Writes
• For most of the workloads (especially
with 50/50 RW ratio) M4s with EBS...
Choosing instance & storage
capacity: 80% Reads
• For most of the workloads M4s with
EBS is the good choice
• When the per...
FAQ: 2AZ cluster architecture
Hint: RetryPolicy for
Cassandra Driver
FAQ
Cassandra backup
/ restore
Auto Scaling of
Cassandra
clusters
Cassandra in
Containers
- Restore procedure for
the whol...
FAQ: Troubleshooting
JVM Caching Compaction
Disks I/O CPU Memory
MongoDB
What is it?
• Document-oriented
database
• Horizontally scalable
• HA is based on master /
slave replication
• Geo-distrib...
Main concerns of the customers
Schema & usage
pattern
Geo distribution and
performance
Data consistency &
partition tolera...
How does it work?
Choosing instance & storage
• MongoDB needs a lot of memory and
really fast disks so unless your
dataset is quite big the ...
FAQ: 2AZ cluster architecture
Best option: Replica Set in one AZ and Hidden member in another one.
FAQ
MongoDB backup /
restore
Querying large
amount of data
MongoDB
consistency
- Hidden nodes with EBS
and EBS snapshots
b...
FAQ: Troubleshooting
Mongos performance Long running queries Fragmentation
Disks I/O CPU Memory
CouchDB
What is it?
• Document-oriented database built
on Dynamo model
• Supports RESTful API
• Eventual consistency
• Lockless op...
How it works?
Choosing instance & storage
FAQ: 2AZ cluster architecture
• You should plan
replication schema on
your own so it is your
responsibility to check
how i...
FAQ
Proper replication
schema
Indexed views & its
performance
Proxy for requests
Aerospike
What is it?
• In-memory key-
value database
• High and constant
performance
• Sharing-nothing
architecture
• Geo-distribut...
How does it work?
Choosing instance & storage
• Aerospike is used when
the performance
requirements are extreme.
It needs a lot of memory
an...
FAQ: 2AZ cluster architecture
• If one AZ goes down
depending on you
replication factor you will
still have a copy of data...
FAQ
Aerospike backup
/ restore
Auto Scaling of
Aerospike clusters
Aerospike in
Containers
- Restore procedure for
the whol...
FAQ: Troubleshooting
Disks I/O CPU Memory
What is it?
• Graph database
• JVM based
• Provides REST API
• Two clustering modes: HA
cluster & Casual cluster
• Two typ...
How does it work?
Choosing instance & storage
FAQ: 2AZ cluster architecture
• If AZ fails and the
master node was in it –
new master election
procedure is initiated
• C...
FAQ: Troubleshooting
JVM Page Caching
Disks I/O CPU Memory
NoSQL on EC2:
Cost considerations
General cost considerations
Usage pattern (R/W) RPS Size of the dataset
Traffic costs Object size Number of nodes
Cost: Performance / Size
• If you want to be always cost
effective and efficient than
deployment is a journey for you
• Co...
Sum up
• There is no general solution for all
cases
• Context matters and the solution
should follow the changing context
...
Thank you!
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks
Prochain SlideShare
Chargement dans…5
×

Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

1 653 vues

Publié le

Learn how to optimize your NoSQL database on AWS for cost, efficiency, and scale. NoSQL databases are great for modern datasets that require simplicity in design, handle structured and unstructured data, scale horizontally, and offer finer control over availability. With AWS, you have options for running NoSQL on Amazon EC2 with Amazon EBS or on Amazon DynamoDB. This webinar will dive deep into best practices and architectural considerations for designing and managing NoSQL databases like Cassandra, MongoDB, CouchDB, and Aerospike on EC2 and EBS. We will share best practices around instance and volume selection, provide performance tuning hints, and describe cost optimization techniques.

Learning Objectives:
• Learn about common NoSQL database options and use cases for Cassandra, MongoDB, CouchDB, and Aerospike
• Review best practices around architecting on AWS for different NoSQL databases
• Understand the cost vs. performance of different Amazon EC2 instances and Amazon EBS volumes

Publié dans : Technologie
  • Soyez le premier à commenter

Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

  1. 1. Andrey Zaychikov, Solutions Architect, EMEA 21.02.2017 Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS
  2. 2. Typical algorithm of choosing right options for NoSQL DB deployments
  3. 3. What we will cover today?
  4. 4. How these databases differs? DynamoDB Cloud-based Self-managed (EC2) Key-value Document-oriented Graph
  5. 5. Cassandra
  6. 6. What is it? • Dynamo model database + CQL • Horizontally scalable • No single point of failure • Data is immutable and stored in collections • JVM based • Lot of management work is done in a background • Rely on gossip protocol
  7. 7. Main concerns of the customers Schema & usage pattern Geo distribution Background routines & specific optimizations
  8. 8. How does it work?
  9. 9. Choosing instance & storage capacity: 80% Writes • For most of the workloads (especially with 50/50 RW ratio) M4s with EBS is the best option • For write-heavy workloads with high RPS requirements C4 with EBS should be considered • When the performance requirements are high and the size of the dataset is relatively small you can use I2s with ephemeral storage
  10. 10. Choosing instance & storage capacity: 80% Reads • For most of the workloads M4s with EBS is the good choice • When the performance requirements are high and the size of the dataset is relatively small you can use I2s with ephemeral storage • When performance requirements are high and dataset is large the best option will be to use R4s with different EBS flavors
  11. 11. FAQ: 2AZ cluster architecture Hint: RetryPolicy for Cassandra Driver
  12. 12. FAQ Cassandra backup / restore Auto Scaling of Cassandra clusters Cassandra in Containers - Restore procedure for the whole cluster can be complicated - Restore for single node can be done with EBS Snapshots - Auto-scaling puts unpredictable pressure on the cluster - Scaling up is simple, but scaling down is extremely complicated - Makes sense only for test / dev environments
  13. 13. FAQ: Troubleshooting JVM Caching Compaction Disks I/O CPU Memory
  14. 14. MongoDB
  15. 15. What is it? • Document-oriented database • Horizontally scalable • HA is based on master / slave replication • Geo-distributed • Lots of management work is done in a background
  16. 16. Main concerns of the customers Schema & usage pattern Geo distribution and performance Data consistency & partition tolerance
  17. 17. How does it work?
  18. 18. Choosing instance & storage • MongoDB needs a lot of memory and really fast disks so unless your dataset is quite big the best option will be either R3 or I2 (depending on the size of the dataset) • If the dataset is big you should consider to use R4 with different EBS flavors • For hidden nodes you use M4 with EBS as EBS snapshots would help you to backup data easily
  19. 19. FAQ: 2AZ cluster architecture Best option: Replica Set in one AZ and Hidden member in another one.
  20. 20. FAQ MongoDB backup / restore Querying large amount of data MongoDB consistency - Hidden nodes with EBS and EBS snapshots backups - Design schema properly - Avoid using MapReduce on Master - Lots of improvements where done but there are some edge cases
  21. 21. FAQ: Troubleshooting Mongos performance Long running queries Fragmentation Disks I/O CPU Memory
  22. 22. CouchDB
  23. 23. What is it? • Document-oriented database built on Dynamo model • Supports RESTful API • Eventual consistency • Lockless optimistic with conflicts resolution • Horizontally scalable (with constraints) • Offline-first database • Map reduce to prepare views
  24. 24. How it works?
  25. 25. Choosing instance & storage
  26. 26. FAQ: 2AZ cluster architecture • You should plan replication schema on your own so it is your responsibility to check how it will behave in case of DR event
  27. 27. FAQ Proper replication schema Indexed views & its performance Proxy for requests
  28. 28. Aerospike
  29. 29. What is it? • In-memory key- value database • High and constant performance • Sharing-nothing architecture • Geo-distributed (hash partitions) • Master-slave replication
  30. 30. How does it work?
  31. 31. Choosing instance & storage • Aerospike is used when the performance requirements are extreme. It needs a lot of memory and super fast disks. That is why EC2 with Ephemeral storage would be a first choice for Aerospike deployments.
  32. 32. FAQ: 2AZ cluster architecture • If one AZ goes down depending on you replication factor you will still have a copy of data • Aerospike will be able to add more nodes and replicate data to it without putting much pressure on the existing nodes • It takes time to replicate data
  33. 33. FAQ Aerospike backup / restore Auto Scaling of Aerospike clusters Aerospike in Containers - Restore procedure for the whole cluster can be complicated - Restore for single node can be done with EBS Snapshots - Auto-scaling puts unpredictable pressure on the cluster - Scaling up is simple, but scaling down is complicated - Does not make any sense
  34. 34. FAQ: Troubleshooting Disks I/O CPU Memory
  35. 35. What is it? • Graph database • JVM based • Provides REST API • Two clustering modes: HA cluster & Casual cluster • Two types of nodes – Core nodes & Read replicas (RAFT protocol) • Uses Cypher language for querying Neo4j Casual Clustering
  36. 36. How does it work?
  37. 37. Choosing instance & storage
  38. 38. FAQ: 2AZ cluster architecture • If AZ fails and the master node was in it – new master election procedure is initiated • Core nodes in Casual cluster mode vote by simple majority • If majority is unavailable cluster becomes read-only
  39. 39. FAQ: Troubleshooting JVM Page Caching Disks I/O CPU Memory
  40. 40. NoSQL on EC2: Cost considerations
  41. 41. General cost considerations Usage pattern (R/W) RPS Size of the dataset Traffic costs Object size Number of nodes
  42. 42. Cost: Performance / Size • If you want to be always cost effective and efficient than deployment is a journey for you • Consider EBS as main option for most of the workloads • If your performance requirements are really high and the size of the dataset is relatively low – consider EC2 with ephemerals, overvise – go for EC2 with EBS
  43. 43. Sum up • There is no general solution for all cases • Context matters and the solution should follow the changing context • Apps and code should be adapted to the way NoSQL DBs work • Initial choice of the deployment options can be changed • Best way to make initial choice of the deployment – PoC
  44. 44. Thank you!

×