Amazon DynamoDB Design Patterns & Best Practices1. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon DynamoDB
Design Patterns
& Best Practices
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/amazon-dynamodb-patterns-
practices
3. Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
4. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Internet-scale Database Requirements
Unlimited throughput
• Social applications
• Online gaming
Elasticity and flexibility
• Application could go viral at any time
• Must handle sudden traffic without code changes
Predictable performance
• Low latency
• No latency increase or throughput decrease with increase in data
set size or throughput
No administration
2
5. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What is Amazon DynamoDB?
Fully managed NoSQL database service
Accessible via simple web service APIs
3
Id Title Year
1 Terminator 1984
2 Titanic 1997
Movies Table
Client
6. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What can I do with DynamoDB?
Offload operating and scaling a highly available
distributed database cluster to AWS
• Serve any level of request traffic
• Store and retrieve any amount of data
• Pay a low price for what you use
Fast time to market
4
7. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DEMO
DynamoDB Speed Test! (www.DynaSpeed.net)
8. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
9. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Region
Availability Zone Availability ZoneAvailability Zone
Cluster
controller
100 c1.mediums /
200 virtual CPUs
DynamoDB
Demo Architecture…
10. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Demo Architecture…
DynamoDB
Master
node/cluster
controller
Worker nodes
11. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DATA MODEL, DATA TYPES & API
9
12. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Tables, Items, Attributes
Table is a collection of Items
Item is a collection of Attributes (name-value pairs)
Primary key is required
10
HashKey Attribute1 Attribute2 Attribute3
item1 userid=bob email=bob@gmail.com joindate=20121221 Sex=M
item2 userid=ken email=ken@yahoo.com joindate=20130210
UserProfiles
13. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Data Types
Scalar data types
• String, Number, Binary
Multi-valued types
• String Set, Number Set, Binary Set
11
14. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Indexing
Data Indexed by primary key
Type of primary keys
• Hash
• Hash + Range
15. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Local Secondary Index
Alternate range key
Index local to the hash
key
All indexe data local to
the partition
user=bob file=file1.txt date=2013/01/10 size=200 url=s3://bucket/bob/file1.txt
user=bob file=folder1/file1 date=2013/12/21 size=100 url=s3://bucket/bob/folder1/file1
user=bob file=folder1/file2 data=2013/01/10 size=100 url=s3://bucket/bob/folder1/file1
user=ken file=folder1/file1 date=2013/02/10 size=300 shared=Y url=s3://bucket/ken/folder2/file1
user=ken file=file2.jpg date=2013/02/10 size=300 shared=Y url=s3://bucket/ken/file2.jpg
user=bob date=2013/01/10 file=folder1/file2
user=bob date=2013/01/10 file=file1.txt
user=bob date=2013/12/21 file=folder1/file1
user=ken date=2013/02/10 file=folder1/file1
user=ken date=2013/02/10 file=file2.jpg
DateIndex
File MetadataTable
16. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Partitioning
Data is auto-partitioned by hash key
Auto-partitioning driven by:
• Table size
• Provisioned throughput DynamoDB table
Client
Partition 1
Partition N
17. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Provisioned Throughput Model
Throughput declared/updated via API or console
• CreateTable (foo, reads/sec = 100, writes/sec = 100)
• UpdateTable (foo, reads/sec = 10000, writes/sec = 10000)
DynamoDB handles the rest
• Capacity is reserved and available when needed
• Throughput increases trigger repartitioning and reallocation
High performance at any scale
18. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
API
CreateTable
UpdateTable
DeleteTable
DescribeTable
ListTables
PutItem
GetItem
UpdateItem
DeleteItem
BatchGetItem
BatchWriteItem
Query
Scan
manage tables
query specific
items OR scan
the full table
read and
write items
bulk get or
update
19. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Read Patterns
GetItem (table, key) -> Item
Query (table, hash_key, [range_key_condition]) -> Items
BatchGetItem (table1:key1, …tableN:keyN) -> items
Scan (table) -> Items
17
20. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Write Patterns
PutItem (table, key, [attributes])
UpdateItem (table, key, [attributes])
DeleteItem (table, key)
BatchWriteItem
(table1:key1[:attributes]…tableN:keyN[:attributes])
18
21. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DYNAMODB CHARACTERISTICS
19
22. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
High Availability and Durability
Multi-datacenter (AZ) replication and failover
• If one machine or datacenter fails, another serves your requests
• High availability
• Protects against data loss
23. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What DynamoDB Manages For You
Hardware provisioning
Cross-availability zone replication
Monitoring and handling of hardware failures
• Replicas automatically regenerated whenever necessary
Hardware and Software updates
ADMIN
24. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB Scale out
Data is automatically partitioned
Partitions are fully independent
No limits as long as workload is well spread
25. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Consistently Low Latencies
Typically single digit millisecond average Put and Get
latencies
Custom SSD based storage platform
• Performance independent of table size
• No need for working set to fit in memory
26. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Authentication & Wire format
Session based authentication
• Client establishes session via AWS Security Token Service
(STS) and retrieves token
• Client signs with session token valid for a few hours
• Streamlines authentication to minimize latency
Request and Response parameters encoded in JSON
• Widely adopted industry standard
• Relatively compact and efficient to parse
24
27. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Consistency
Strictly or eventually consistent reads
• Specified at API level for maximum flexibility
• Throughput, not latency tradeoff
Strictly consistent writes
• Atomic increment/decrement and get
• Conditional write a.k.a. optimistic concurrency control
GetItem & Query APIs support eventually consistent and
consistent reads
Scan & BatchGetItem only support eventually consistent
reads
25
28. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Transactions
Supports Item level transaction
• UpdateItem, PutItem and DeleteItem operate at the Item level
and their changes are ACID
• UpdateItem supports atomic ADD and Get
No multi-item or cross table transactions
• While BatchWriteItem operates on multiple items and across
tables, but it only supports transactions at an item level
26
29. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
MODELING RELATIONSHIPS IN
DYNAMODB
30. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Modeling 1:1 relationships
Use a table with a hash key
Examples:
• Users
• Hash key = UserId
• Games
• Hash key = GameId
Users Table
Hash key Attributes
UserId = bob Email = bob@gmail.com, JoinDate = 2011-11-15
UserId = fred Email = fred@yahoo.com, JoinDate = 2011-12-
01, Sex = M
Games Table
Hash key Attributes
GameId = Game1 LaunchData = 2011-10-15, Version = 2,
GameId = Game2 LaunchDate = 2010-05-12, Version = 3,
GameId = Game3 LaunchDate = 2012-01-20, version = 1
31. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Modeling 1:N relationships
Use a table with hash and range key
Example:
• One (1) User can play many (N) Games
• User_Games table
– Hash key = UserId
– Range key = GameId
User Games table
Hash Key Range key Attributes
UserId = bob GameId = Game1 HighScore = 10500,
ScoreDate = 2011-10-20
UserId = fred GameId = Game2 HIghScore = 12000,
ScoreDate = 2012-01-10
UserId = bob GameId = Game3 HighScore = 20000,
ScoreDate = 2012-02-12
32. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Modeling N:M relationships
Use two hash and range tables
Example:
• One User can play many Games
• Hash key = UserId
• Range key = GameId
• One Game can have many Users
• Hash key = GameId
• Range key = UserId
User_Games
Hash Key Range key
UserId = bob GameId = Game1
UserId = fred GameId = Game2
UserId = bob GameId = Game3
Game_Users
Hash Key Range key
GameId = Game1 UserId = bob
GameId = Game2 UserId = fred
GameId = Game3 UserId = bob
33. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Modeling Multi-tenancy
Use tenant id as the hash key
• Example: UserId in the User Profiles and User
Scores tables:
34. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
MODELING EXAMPLE
35. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Example1:
Multi-tenant application for storing file metadata
Access Patterns
1. Get bob’s profile
2. List files owned by ‘bob’
3. List bob’s files created between T1 and T2
4. List bob’s shared files
5. List bob’s files by descending order of file size
36. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Entities and Relationships
Entities:
• Users
• Files
Relationship
• One User has many Files (1:N)
37. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Users (hash)
user=bob email=bob@gmail.com joindate=‘2012/12/21’
user=ken email=ken@yahoo.com joindate=‘2013/02/10’
Files (hash-range)
user=bob file=file1.txt date=2013/01/10 size=200 url=s3://bucket/bob/file1.txt
user=bob file=folder1/file1 date=2013/12/21 size=100 url=s3://bucket/bob/folder1/file1
user=bob file=folder1/file2 data=2013/01/10 size=100 url=s3://bucket/bob/folder1/file1
user=ken file=folder1/file1 date=2013/02/10 size=300 shared=Y url=s3://bucket/ken/folder2/file1
user=ken file=file2.jpg date=2013/02/10 size=300 shared=Y url=s3://bucket/ken/file2.jpg
DynamoDB Data Model
Hash key (Tenant ID)
Range key
38. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Primary Index Get & Query
Get bob’s profile
• GetItem (table = Users, user = ‘bob’)
List files owned by ‘bob’
• Query (table = Files, user = “bob”)
user=bob email=bob@gmail.com joindate=‘2012/12/21’
user=bob file=file1.txt date=2013/01/10 size=200 url=s3://bucket/bob/file1.txt
user=bob file=folder1/file1 date=2013/12/21 size=100 url=s3://bucket/bob/folder1/file1
user=bob file=folder1/file2 data=2013/01/10 size=100 url=s3://bucket/bob/folder1/file1
39. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Local Secondary Index Query
List bob’s files & folders created between T1 and T2
• Query (table = Users, user = bob, IndexName = DateIndex, date
BETWEEN 2013/01/10 and 2013/01/20)
user=bob date=2013/01/10 file=folder1/file2
user=bob date=2013/01/10 file=file1.txt
user=bob date=2013/12/21 file=folder1/file1
user=ken date=2013/02/10 file=folder1/file1
user=ken date=2013/02/10 file=file2.jpg
DateIndex
40. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Local Secondary (sparse) Index Query
List bob’s shared files & folders
• Query (Table = Users, user = bob, IndexName = SharedIndex,
shared = Y)
• No matches found
user=ken shared=Y file=folder1/file1
user=ken shared=Y file=file2.jpg
SharedIndex
41. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Local Secondary Index Query (backwards)
List bob’s files & folders by descending order of size
• Query (Table = Users, user = bob, IndexName = SizeIndex,
ScanIndexForward = false)
SizeIndex
user=bob size=100 file=folder1/file2
user=bob size=100 file=folder1/file1
user=bob size=200 file=file1.txt
user=ken size=0 file=file2.jpg
user=ken size=300 file=folder1/file1
42. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
BEST PRACTICES
43. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Storing Large Items – Pattern 1
Break large attributes
across multiple
DynamoDB items
Store Large attributes
in Amazon S3
MESSAGE-ID
(hash key)
1 FROM = ‘user1’
TO = ‘user2’
DATE = ‘12/12/2011’
SUBJECT = ‘DynamoDB Best practices’
BODY= ‘The first few Kbytes…..’
BODY_OVERFLOW = ‘S3bucket+key’
MESSAGE-ID
(hash key)
PART
(range key)
1 0 FROM = ‘user1’
TO = ‘user2’
DATE = ‘12/12/2011’
SUBJECT = ‘DynamoDB Best practices’
BODY = ‘The first few Kbytes…..’
1 1 BODY = ‘ the next 64k’
1 2 BODY = ‘ the next 64k’
1 3 EOM
41
44. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Storing Large Items – Pattern 2
Use a overflow table for large attributes
Retrieve items via BatchGetItems
Mail Box Table
ID (hash key)
Timestamp (range key)
Attribute1
Attribute2
Attribute3
….
AttributeN
LargeAttribute
MailBox Table
ID (hash key)
Timestamp (range key)
Attribute1
Attribute2
Attribute3
….
AttributeN
LargeAttributeUUID
Overflow Table
LargeAttributeUUID LargeAttribute
45. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Storing Time Series Data
You application wants to
keep one year historic data
You can pre-create one
table per week (or per day
or per month) and insert
records into the appropriate
table based on timestamp
43
Events_table_2012
Event_id
(Hash key)
Timestamp
(range key)
Attribute1 …. Attribute N
Events_table_2012_05_week1
Event_id
(Hash key)
Timestamp
(range key)
Attribute1 …. Attribute NEvents_table_2012_05_week2
Event_id
(Hash key)
Timestamp
(range key)
Attribute1 …. Attribute NEvents_table_2012_05_week3
Event_id
(Hash key)
Timestamp
(range key)
Attribute1 …. Attribute N
46. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Searching across items (with different hash keys)
Create additional tables
which will server as indexes
• Example: First_name_index
& Last_name_index
Query: Get me all the Users
data for First_name = ‘Tim’
• Query First_name_index for
hash key = ‘Tim’
• This will return User_id =
(101, 201)
• BatchGet (Users, [101, 201])
44
User_Id
(hash
key)
First_name Last_name …
101 Tim White
201 Tim Black
301 Ted White
401 Keith Brown
501 Keith White
601 Keith Black
First_name
(hash key)
User_id
(range key)
Tim 101
Tim 201
Ted 301
Keith 401
Keith 501
Keith 601
Last_name
(hash key)
User_id
(range key)
White 101
Black 201
White 301
Brown 401
White 501
Black 601
Users
First_name_index Last_name_index
47. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Avoiding Hot Keys
Use multiple keys (aliases) instead of a
single hot key
Generate aliases by prefixing or suffixing
a known range (N)
Use BatchGetItem API to retrieve ticket
counts for all the aliases (1_Avatar,
2_Avatar, 3_Avatar,…, N_Avatar) and
sum them in your client application
45
MOVIES
MNAME (hash key)
1_Avatar TicketCount =
4,000,000
2_Avatar TicketCount =
2,000,000
3_Avatar TicketCount =
4,000,000
….
N_Avatar TicketCount =
4,000,000
48. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
When to use
Key-value or simple queries
Very high read/write rate
Need Auto-sharding
Need on-line scaling across
multiple nodes
Consistently low latency
No size or throughput limits
No Tuning
High durability
When not to use
Need multi-item/row or cross table
transactions
Need complex queries, joins
Need real-time Analytics on historic
data
When to use and when not to use DynamoDB
49. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Questions?
50. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Backup slides
51. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB / Elastic MapReduce integration
Harness Hadoop parallel processing pipeline to
• Perform complex analytics
• Join DynamoDB tables with outside data sources like S3
• Export data from DynamoDB to S3
• Import data from S3 into DynamoDB
Easy to leverage DynamoDB’s scale
49
52. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
S3
EMR DynamoDB
50
53. Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/amazon-
dynamodb-patterns-practices