Contenu connexe Similaire à DataStax: The Whys of NoSQL (20) Plus de DataStax Academy (20) DataStax: The Whys of NoSQL2. 1 Jargon Galore
2 Schema
3 Modeling and Internals
4 Deployment
5 Conclusion
2© 2015. All Rights Reserved.
5. Schema
©2015 DataStax Confidential. Do not distribute without consent.
Rigid
Schema
Schema
Free
Schema
on read
Schema
Easy to change
In flexible Writes are schema free,
reads are freaking slow
Reads/Writes are schema aware
Schema changes are O(1) operations
BLOBs
Too Slow
Optimized for Agility of change when needed, not theoretical extremes
6. ©2015 DataStax Confidential. Do not distribute without consent.
6
Normalization, Joins, Referential Integrity
Database normalization is the process of
organizing the columns (attributes) and
tables (relations) of a relational database
to minimize data redundancy.
Referential integrity is a property of data
which, when satisfied, requires every
value of one column of a table to exist
as a value of another column in a
different table.A JOIN is a means for combining
fields from two tables (or more) by
using values common to each.
Source - https://en.wikipedia.org/
7. ©2015 DataStax Confidential. Do not distribute without consent.
7
Not all Data Access is equal
1:168K random vs.
sequential
1:10 random vs.
sequential
Source - https://queue.acm.org/detail.cfm?id=1563874
8. ©2015 DataStax Confidential. Do not distribute without consent.
8
Disk Density
Source http://silvertonconsulting.com/blog/2010/04/22/save-the-planet-buy-fatter-disks-and-flash/#sthash.sh2nwqtX.dpbs
9. ©2015 DataStax Confidential. Do not distribute without consent.
9
$0.01
$0.10
$1.00
$10.00
$100.00
$1,000.00
$10,000.00
$100,000.00
$1,000,000.00
201420132010200520001995199019851980
HDD Price / GB
Minimize Data
Redundancy?
Disk Price / GB
10. OS Cache
C* Read and Write paths
©2015 DataStax Confidential. Do not distribute without consent.
Memtable 1 Memtable 2 Memtable N
SSTable 1 SSTable 2 SSTable N
Commit Log
Persistent
Storage
Off Heap
In Process Memory
Reads (memtable + N SSTables where N >= 1)
Mandatory Flush
Writes
Max # of SSTables = N
(based on compaction)
Creation of new memtable during flush operation (cleanup
tombstones, cleanup token ranges, etc.)
Time (memtable_flush_in_ms controls the frequency)
Accounting
SSTable
Compacted
RANDOM ACCESS
SEQUENTIAL ACCESS
12. Key takeaways
©2015 DataStax Confidential. Do not distribute without consent.
Optimal utilization of physical resources (random
access, sequential IO and CPU)
No Read before Write (well mostly!)
Plan for Compaction (like commercial paper, you need
a regular pay back)
De-Normalize for optimal application response (use
2NF instead of 3NF)
13. Deployment Semantics
©2014 DataStax Confidential. Do not distribute without consent.
R/W R
Single BoxDR
GR
ScaleUpby.
Sharding
Replication
GR + DR
San
Francisco
New York
Stockholm
DC1 DC2
14. Linear Scaling
©2015 DataStax Confidential. Do not distribute without consent.
http://www.datastax.com/apache-cassandra-leads-nosql-benchmark
End Point Report Excerpt: Balanced Read/Write YCSB Test
15. So what's the catch?
©2015 DataStax Confidential. Do not distribute without consent.
16. ©2015 DataStax Confidential. Do not distribute without consent.
16
Conclusion
Best in class performance, backed by physics
Enables pragmatic business agility,
Delivering delightful customer experience,
Always on, Linear Scale architecture delivering optimal ROI