5. Data Tier
Cache Data Warehouse Blob Store
RDBMSNoSQL Search
Cloud Data Tier Architecture
App/Web Tier
Client Tier
best database for
each workload
6. Workload Driven Data Store Selection
Data Tier
Cache Data Warehouse Blob Store
RDBMSNoSQL Search
logging
rich search
key/value
simple query
hot reads
analytics
complexqueries
& transactions
7. AWS Services for the Data Tier
Data Tier
Amazon
DynamoDB
Amazon
RDS
Amazon
ElastiCache
Amazon
S3
Amazon
CloudSearch
Amazon
Redshift
logging
rich search
key/value
simple query
hot reads
analytics
complexqueries
& transactions
8. AWS Services for the Data Tier
Data Tier
Amazon
DynamoDB
Amazon
RDS
Amazon
ElastiCache
Amazon
S3
Amazon
CloudSearch
Amazon
Redshift
logging
rich search
key/value
simple query
hot reads
analytics
complexqueries
& transactions
9. DynamoDB is a managed
NoSQL database service.
Store and retrieve any amount of data
Serve any level of request traffic
16. RDBMS = Default Choice
• Amazon.com page composed of responses from 1000’s of
independent services
• Query patterns for different service are different
Catalog service is usually heavy key-value
Ordering service is very write intensive (key-value)
Catalog search has a different pattern for querying
Relational Era @ Amazon.com
RDBMS
PoorAvailability Limited Scalability High Cost
17. Dynamo = NoSQL Technology
• Replicated DHT
• Consistent hashing
• Optimistic replication
• Quorum strategies
• Anti-entropy mechanisms
• Object versioning
Distributed Era @ Amazon.com
lack of strong every engineer needsto operational
consistency learndistributedsystems complexity
18. DynamoDB = NoSQL Cloud Service
Cloud Era @ Amazon.com
Seamless Scalability
Fast & Predictable Performance
Easy Administration
Streamlined Development
Cost Effective
19. partitions
1 .. N
table
• DynamoDB automatically
partitions data by the hash key
Hash key spreads data (& workload)
across partitions
• Auto-partitioning occurs with
Data set size growth
Provisioned capacity increases
Massive and Seamless Scale
20. WRITES
Continuously replicated to 3 Facilities
Quorum acknowledgment
Persisted to disk (SSD)
READS
Strongly or eventually consistent
No trade-off in latency
Durable At Scale
21. Provisioned Throughput
• Request-based capacity provisioning model
• Throughput is declared and updated via the API or the
console
CreateTable (foo, reads/sec = 100, writes/sec = 150)
UpdateTable (foo, reads/sec=10000, writes/sec=4500)
• DynamoDB handles the rest
Capacity is reserved and available when needed
Scaling-up triggers repartitioning and reallocation
No impact to performance or availability
Predictable Performance
22. WRITES
Continuously replicated to 3 Facilities
Quorum acknowledgment
Persisted to disk (SSD)
READS
Strongly or eventually consistent
No trade-off in latency
Low Latency At Scale
23. Making life easier for developers…
• Developers are freed from:
Performance tuning (latency)
Automatic 3-way multi-facility replication
Scalability (and scaling operations)
Security inspections, patches, upgrades
Software upgrades, patches
Automatic hardware failover
Improving the underlying hardware
…and more!
Automated Operations
30. Hash = Distribution Key
partition 1..N
hash keys
mandatory for all items in a table
key-value access pattern
determines data distribution
31. Hash = Distribution Key
large number of unique hash keys
uniform distribution of workload
across hash keys
optimal
schema
design
+
32. Range = Query
range
hash
range keys
model 1:N relationships
enable rich query capabilities
composite primary key
all items for a hash key
==, <, >, >=, <=
“begins with”
“between”
sorted results
counts
top / bottom N values
paged responses
33. Index Options
local secondary indexes (LSI)
alternate range key + same hash key
index and table data is co-located (same partition)
39. • Method
1. Describe the overall use case – maintain context
2. Identify the individual access patterns of the use case
3. Model each access pattern to its own discrete data set
4. Consolidate data sets into tables and indexes
• Benefits
Single table fetch for each query
Payloads are minimal for each access
Access Pattern Modeling
40. Multi-tenant application for file storing and sharing
• User_ID is the unique identifier of each user
• File_ID is the unique identifier of each file, owner by user
GoodPK selection:User_ID(hash) + File_ID(range)
use case access patterns data design
Design Use Case: Media Catalog
41. 1. Users should be able to query all the files they own
2. Search by File Name
3. Search by File Type
4. Search by Date Range
5. Keep track of Shared Files
Design Use Case: Media Catalog
use case access patterns data design
42. 1. Users should be able to query all the files they own
2. Search by File Name
3. Search by File Type
4. Search by Date Range
5. Keep track of Shared Files
Design Use Case: Media Catalog
use case access patterns data design
additional (non-PK) attributes
& index candidates
43. Users
Hash key = User_ID
Attributes= User_Name
Email
Address
User_Files
Hash key = User_ID
Range key = File_ID
Attributes= Name
Size (N)
Date
SharedFlag
Link
DynamoDB Data Model: Main Tables
User has file[]
44. + Secondary Indexes
Table Name Index Name Attribute to Index Projected Attribute
User_Files NameIndex Name KEYS
User_Files TypeIndex Type KEYS + Name
User_Files DateIndex Date KEYS + Name
User_Files SharedFlagIndex SharedFlag KEYS + Name
User_Files SizeIndex Size KEYS + Name
example only – required data returned
determines optimal projections
45.
46.
47.
48.
49.
50.
51.
52.
53.
54. • Find all files owned by a user
Query User_Files table (User_ID = “2”)
Access Pattern 1
User_ID
(Hash)
File_ID
(Range)
Name Date Type SharedFlag Size Link
1 1 File1 2013-04-23 JPG 10000 bucket1
1 2 File2 2013-03-10 MP4 Y 1000000 bucket2
2 3 File3 2013-03-10 MP4 Y 2000000 bucket3
2 4 File4 2013-03-10 AVI 3000000 bucket4
3 5 File5 2013-04-10 MP4 40000 bucket5
55. • Find all files owned by a user
Query User_Files table (User_ID = “2”)
Access Pattern 1
User_ID
(Hash)
File_ID
(Range)
Name Date Type SharedFlag Size Link
1 1 File1 2013-04-23 JPG 10000 bucket1
1 2 File2 2013-03-10 MP4 Y 1000000 bucket2
2 3 File3 2013-03-10 MP4 Y 2000000 bucket3
2 4 File4 2013-03-10 AVI 3000000 bucket4
3 5 File5 2013-04-10 MP3 40000 bucket5
58. • Search for file name by
file Type
Query
• IndexName = “TypeIndex”
• User_ID = “2”
• Type = “MP4”
Access Pattern 3
UserId
(hash)
Type
(range)
File_ID Name
1 JPG 1 File1
1 MP4 2 File2
2 MP4 4 File4
2 AVI 3 File3
3 MP3 5 File5
projection
TypeIndex
59. • Search for file name by
file Type
Query
• IndexName = “TypeIndex”
• User_ID = “2”
• Type = “MP4”
Access Pattern 3
UserId
(hash)
Type
(range)
File_ID Name
1 JPG 1 File1
1 MP4 2 File2
2 MP4 4 File4
2 AVI 3 File3
3 MP3 5 File5
projection
TypeIndex
60. • Search for file name by Date
range
Query
• IndexName = “DateIndex”
• User_ID = “1”
• Date between “2013-03-01”
and “2013-03-29”
Access Pattern 4
User_ID
(hash)
Date
(range)
FileId Name
1 2013-03-10 2 File2
1 2013-04-23 1 File1
2 2013-03-10 3 File3
2 2013-03-10 4 File4
3 2013-04-10 5 File5
DateIndex
projection
61. • Search for file name by Date
range
Query
• IndexName = “DateIndex”
• User_ID = “1”
• Date between “2013-03-01”
and “2013-03-29”
Access Pattern 4
User_ID
(hash)
Date
(range)
FileId Name
1 2013-03-10 2 File2
1 2013-04-23 1 File1
2 2013-03-10 3 File3
2 2013-03-10 4 File4
3 2013-04-10 5 File5
DateIndex
projection
62. • Search for names of
Shared files
Query
• IndexName =
“SharedFlagIndex”
• User_ID = “1”
• SharedFlag = “Y”
Access Pattern 5
User_ID
(hash)
SharedFlag
(range)
FileId Name
1 Y 2 File2
2 Y 3 File3
SharedFlagIndex
projection
63. • Search for names of
Shared files
Query
• IndexName =
“SharedFlagIndex”
• User_ID = “1”
• SharedFlag = “Y”
Access Pattern 5
User_ID
(hash)
SharedFlag
(range)
FileId Name
1 Y 2 File2
2 Y 3 File3
SharedFlagIndex
projection
64. • Schema-less
Only key information needed
Individual items can define their own set of attributes
• Consistent Reads
Inventory, shopping cart applications
• Atomic Counters
Increment and return new value in same operation
• Conditional Writes
Expected value before write – fails on mismatch
“state machine” use cases
Highlighted Features
69. • Third party library for automating scaling decisions
• Scale up for service levels, scale down for cost
• CloudFormation template for fast deployment
Autoscaling with Dynamic DynamoDB
70. • Cross-Region Export and Import
• DynamoDB Local
Disconnected development with full API support
• No network
• No usage costs
• No SLA
• Geospatial and Transaction Libraries
• Fine-Grained Access Control
Direct-to-DynamoDB access for mobile devices
Other Key Features
Get started today!
aws.amazon.com/dynamodb/developer-resources/