by Rajeev Srinivasan, Sr. Solutions Architect, AWS
Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. Its flexible data model, reliable performance, and automatic scaling of throughput capacity, makes it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications. We’ll take a look at how DynamoDB works and how it can be accelerated by DAX, the DynamoDB Accelerator.
3. NoSQL foundations
0000 {“Texas”}
0001 {“Illinois”}
0002 {“Oregon”}
TXW
A
I
L
Key
Column
0000-0000-0000-0001
Game Heroes
Version 3.4
CRC ADE4
Key Value Graph Document Column-family
Dynamo:
Amazon’s
Highly Available
Key-value
Store
January 2012Fall 2007 June 2009
Meetup
235 2nd St
San
Francisco
4. What (some) customers store in NoSQL DBs
Market Orders Tokenization
(PHI, Credit Cards)
Chat MessagesUser Profiles
(Mobile)
IoT Sensor Data
(& device status!)
File MetadataSocial Media Feeds
5. DataXu’s Attribution Store
AWS
Direct
Connect
Amazon
DynamoDB
Amazon
RDS
AWS Data
Pipeline
AWS IAM
Amazon
SNS
Amazon
CloudWatch
EMR Job
Amazon
EC2
Amazon
S3 Bucket
1st Party Data
3rd Party Data
“Attribution" is the marketing term of art for allocating full or partial credit to
individual advertisements that eventually lead to a purchase or other desired
consumer interaction.
6. Highly available Consistent, single-digit
millisecond latency
at any scale
Fully managed
Secure
Integrates with AWS Lambda,
Amazon Redshift, and more
Amazon DynamoDB
7. Elastic is the new normal
Write Capacity Units
Read Capacity Units
ConsumedCapacityUnits
>200% increase from baseline
>300% increase from baseline
Time
10. DynamoDB LSIs
Can only be
defined as part
of initial table
creation
A1
(partition key)
A3
(sort key)
A2 A4 A5
A1
(partition key)
A4
(sort key)
A2 A3 A5
A1
(partition key)
A5
(sort key)
A2 A3 A4
• Alternate sort key
attribute
• Index is local to a
partition key
Local secondary indexes
11. RCUs/WCUs
provisioned
separately for GSIs
INCLUDE A2
ALL
KEYS_ONLY
A3
(partition key)
A1
(table key)
A2 A4 A7
A3
(partition key)
A1
(table key)
A3
(partition key)
A1
(table key)
A2
• Alternate partition
(+sort) key
• Index is across all
table partition keys
• Can be added or
removed anytime
• Eventually
consistent
A3
(partition key)
A1
(table key)
A2 A4 A7
A3
(partition key)
A1
(table key)
A2
A3
(partition key)
A1
(table key)
Global secondary indexes
DynamoDB GSIs
12. Data types
Type DynamoDB Type
String String
Integer, Float Number
Timestamp Number or String
Blob Binary
Boolean Bool
Null Null
List List
Set
Set of String,
Number, or Binary
Map Map
13. Table creation options
PartitionKey, Type:
SortKey, Type:
Provisioned Reads:
Provisioned Writes:
LSI Schema GSI Schema
AttributeName [S,N,B]
AttributeName [S,N,B]
1+
1+
Provisioned Reads: 1+
Provisioned Writes: 1+
TableNameOptionalRequired
CreateTable
String,
Number,
Binary ONLY
Per Second
Unique to
Account and
Region
14. Provisioned capacity
Provisioned capacity
Read Capacity Unit (RCU)
1 RCU returns 4KB of data for strongly
consistent reads, or double the data
at the same cost for eventually
consistent reads
Capacity is per second, rounded up to the
next whole number
Write Capacity Unit (WCU)
1 WCU writes 1KB of data, and each
item consumes 1 WCU minimum
15. Horizontal sharding
Host 1 Host 99 Host n
~Each new host brings compute, storage, and network bandwidth~
CustomerOrdersTable
17. CustomerOrdersTable
00
55
AA
FF
Partition A
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition B
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition C
33.33 % Keyspace
33.33 % Provisioned Capacity
Hash.MIN = 0
Hash.MAX = FF
Keyspace
Time
Partition A
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition B
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition D
Partition E
16.66 %
16.66 %
16.66 %
16.66 %
Partition split due to partition size
00
55
AA
FF
Partition A
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition B
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition C
33.33 % Keyspace
33.33 % Provisioned Capacity
Time
Partition A
Partition C
16.66 %
16.66 %
16.66 %
16.66 %
Partition split due to capacity increase
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
Partition B
Partition D
Partition E
Partition F
The desired size of a
partition is 10GB* and
when a partition surpasses
this it can split
*=subject to change
Split for partition size
The desired capacity of a
partition is expressed as:
3w + 1r < 3000 *
Where w = WCU & r = RCU
*=subject to change
Split for provisioned capacity
Partitioning
18. Partition A
1000 RCUs
100 WCUs
Partition C
1000 RCUs
100 WCUs
Host A Host C
Availability Zone A
Partition A
1000 RCUs
100 WCUs
Partition C
1000 RCUs
100 WCUs
Host E Host G
Availability Zone B
Partition A
1000 RCUs
100 WCUs
Partition C
1000 RCUs
100 WCUs
Host H Host J
Availability Zone C
CustomerOrdersTable
54:∞00:0 54:∞00:0 54:∞00:0
FF:∞AA:0 FF:∞AA:0 FF:∞AA:0
Data is replicated to
three Availability
Zones by design
3-way replication
OrderId: 1
CustomerId: 1
ASIN: [B00X4WHP5E]
Hash(1) = 7B
Partition B
1000 RCUs
100 WCUs
Host B Host F Host I
Partition B
1000 RCUs
100 WCUs
Partition B
1000 RCUs
100 WCUs
A9:∞55:0 A9:∞55:0 A9:∞55:0
Partitioning
19. DynamoDB Streams
Partition A
Partition B
Partition C
üOrdered stream of item
changes
üExactly once, strictly
ordered by key
üHighly durable, scalable
ü24-hour retention
üSub-second latency
üCompatible with Kinesis
Client Library
DynamoDB Streams
1
Shards have a lineage and
automatically close after time
or when the associated
DynamoDB partition splits
2
3
Updates
KCL
Worker
Amazon
Kinesis Client
Library
Application
KCL
Worker
KCL
Worker
GetRecords
Amazon DynamoDB
Table
DynamoDB Streams Stream
Shards
20. TTL job
Time-To-Live (TTL)
Amazon DynamoDB
Table
CustomerActiveOrder
OrderId: 1
CustomerId: 1
MyTTL: 1492641900
DynamoDB Streams
Amazon Kinesis
Amazon Redshift
An epoch timestamp marking when
an item can be deleted by a
background process, without
consuming any provisioned capacity
Time-To-Live
Removes data that is no longer relevant
21. Time-To-Live (TTL)
ü TTL items
identifiable in
DynamoDB
Streams
ü Configuration protected by
AWS Identity and Access
Management (IAM), auditable
with AWS CloudTrail
ü Eventual deletion,
free to use
24. DynamoDB Accelerator (DAX)
Private IP, Client-side
Discovery
Supports AWS Java SDK on launch,
with more AWS SDKs to come
Cluster based, Multi-AZ Separate Query and
Item cache
25. DynamoDB in the VPC
Availability Zone #1 Availability Zone #2
Private Subnet Private Subnet
VPC endpoint
web
app
server
security
group
security
group
oMicroseconds latency in-memory cache
oMillions of requests per second
oFully managed, highly available
oRole-based access control
oNo IGW or VPC endpoint required
DAX
oDynamoDB in the VPC
oIAM resource policy
restricted
VPC Endpoints
AWS Lambda
security
group
security
group
DAX
web
app
server
DAX