SlideShare a Scribd company logo
1 of 74
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rick Houlihan, Principal TPM – DBS NoSQL
August 11, 2016
Deep Dive on Amazon DynamoDB
Agenda
• Why NoSQL?
• Scaling Amazon DynamoDB
• Data modeling
• Design patterns and best practices
• Event-driven applications and DynamoDB
Streams
Timeline of database technology
DataPressure
Data volume since 2010
DataVolume
Historical Current
 90 percent of stored data
generated in last 2 years
 1 terabyte of data in 2010
equals 6.5 petabytes today
 Linear correlation between
data pressure and technical
innovation
 No reason these trends will
not continue over time
Technology adoption and the hype curve
Why NoSQL?
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad hoc queries Instantiated views
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
SQL NoSQL
Amazon DynamoDB
Document or Key-Value Scales to Any WorkloadFully Managed NoSQL
Access Control Event Driven ProgrammingFast and Consistent
Scaling
Scaling
Throughput
Provision any amount of throughput to a table
Size
Add any number of items to a table
• Max item size is 400 KB
• Local secondary indexes (LSIs) limit the number of range keys due
to 10 GB limit
Scaling
Achieved through partitioning
Throughput
Provisioned at the table level
• Write capacity units (WCUs) are measured in 1 KB per second
• Read capacity units (RCUs) are measured in 4 KB per second
• RCUs measure strictly consistent reads
• Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent
WCURCU
Partitioning math
In the future, these details might change…
Number of Partitions
By capacity (Total RCU / 3000) + (Total WCU / 1000)
By size Total Size / 10 GB
Total partitions CEILING (MAX (Capacity, Size))
Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500
RCUs per partition = 5000/3 = 1666.67
WCUs per partition = 500/3 = 166.67
Data/partition = 10/3 = 3.33 GB
RCUs and WCUs are uniformly
spread across partitions
Number of Partitions
By Capacity (5000 / 3000) + (500 / 1000) = 2.17
By size 8 / 10 = 0.8
Total partitions CEILING (MAX (2.17, 0.8)) = 3
What causes throttling?
If sustained throughput goes beyond provisioned throughput per partition.
Nonuniform workloads
• Hot keys/hot partitions
• Very large bursts
Mixing hot data with cold data
• Use a table per time period
From the example before:
• Table created with 5,000 RCUs, 500 WCUs
• RCUs per partition = 1666.67
• WCUs per partition = 166.67
• If sustained throughput > (1,666 RCUs or 166 WCUs) per key or partition, DynamoDB
might throttle requests
• Solution: increase provisioned throughput
What bad NoSQL looks like…
Partition
Time
Heat
Getting the most out of DynamoDB throughput
“To get the most out of DynamoDB
throughput, create tables where
the hash key element has a large
number of distinct values, and
values are requested fairly
uniformly, as randomly as
possible.”
—DynamoDB Developer Guide
Space: access is evenly
spread over the key space
Time: requests arrive evenly
spaced in time
Much better picture…
Data modeling
1:1 relationships or key-values
Use a table or global secondary index (GSI) with an
alternate partition key
Use GetItem or BatchGetItem API action
Example:
Given an SSN or license number, get attributes.
Users Table
Partition key Attributes
SSN = 123-45-6789 Email = johndoe@nowhere.com, License = TDL25478134
SSN = 987-65-4321 Email = maryfowler@somewhere.com, License = TDL78309234
Users-Email-GSI
Partition key Attributes
License = TDL78309234 Email = maryfowler@somewhere.com, SSN = 987-65-4321
License = TDL25478134 Email = johndoe@nowhere.com, SSN = 123-45-6789
1:N relationships or parent-children
Use a table or GSI with partition and sort key
Use Query API
Example:
Given a device, find all readings between epoch X, Y
Device-measurements
Partition Key Sort key Attributes
DeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90
DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90
N:M relationships
Use a table and GSI with partition and sort key elements
switched
Use Query API
Example:
Given a user, find all games. Or given a game, find all users
User-Games-Table
Partition Key Sort key
UserId = bob GameId = Game1
UserId = fred GameId = Game2
UserId = bob GameId = Game3
Game-Users-GSI
Partition Key Sort key
GameId = Game1 UserId = bob
GameId = Game2 UserId = fred
GameId = Game3 UserId = bob
Hierarchical data
Tiered relational data structures
SQL vs. NoSQL access pattern
Hierarchical data structures as items
Use composite sort key to define a hierarchy
Highly selective result sets with sort queries
Index anything, scales to any size
Primary Key
Attributes
ProductID type
Items
1 bookID
title author genre publisher datePublished ISBN
Ringworld Larry Niven Science Fiction Ballantine Oct-70 0-345-02046-4
2 albumID
title artist genre label studio released producer
Dark Side of the Moon Pink Floyd Progressive Rock Harvest Abbey Road 3/1/73 Pink Floyd
2 albumID:trackID
title length music vocals
Speak to Me 1:30 Mason Instrumental
2 albumID:trackID
title length music vocals
Breathe 2:43 Waters, Gilmour, Wright Gilmour
2 albumID:trackID
title length music vocals
On the Run 3:30 Gilmour, Waters Instrumental
3 movieID
title genre writer producer
Idiocracy Scifi Comedy Mike Judge 20th Century Fox
3 movieID:actorID
name character image
Luke Wilson Joe Bowers img2.jpg
3 movieID:actorID
name character image
Maya Rudolph Rita img3.jpg
3 movieID:actorID
name character image
Dax Shepard Frito Pendejo img1.jpg
… or as documents (JSON)
JSON data types (M, L, BOOL, NULL)
Document SDKs available
Indexing only by using DynamoDB Streams or AWS Lambda
400 KB maximum item size (limits hierarchical data structure)
Primary Key
Attributes
ProductID
Items
1
id title author genre publisher datePublished ISBN
bookID Ringworld Larry Niven Science Fiction Ballantine Oct-70 0-345-02046-4
2
id title artist genre Attributes
albumID
Dark Side of the
Moon
Pink Floyd Progressive Rock
{ label:"Harvest", studio: "Abbey Road", published: "3/1/73", producer: "Pink
Floyd", tracks: [{title: "Speak to Me", length: "1:30", music: "Mason", vocals:
"Instrumental"},{title: ”Breathe", length: ”2:43", music: ”Waters, Gilmour,
Wright", vocals: ”Gilmour"},{title: ”On the Run", length: “3:30", music: ”Gilmour,
Waters", vocals: "Instrumental"}]}
3
id title genre writer Attributes
movieID Idiocracy Scifi Comedy Mike Judge
{ producer: "20th Century Fox", actors: [{ name: "Luke Wilson", dob: "9/21/71",
character: "Joe Bowers", image: "img2.jpg"},{ name: "Maya Rudolph", dob:
"7/27/72", character: "Rita", image: "img1.jpg"},{ name: "Dax Shepard", dob:
"1/2/75", character: "Frito Pendejo", image: "img3.jpg"}]
Multiversion concurrency control
• Use a metadata items to manage current object
collection
SET #ItemList.items = list_append(:newList, #ItemList.items )
• Remove old items or add version/state to sort keys
after writes
Use composite key queries to get specific versions
• Readers can choose how to handle “fat” reads
Design patterns and best practices
Event logging
Storing time series data
Time series tables
Events_table_2015_April
Event_id
(Partition)
Timestamp
(Sort)
Attribute1 …. Attribute N
Events_table_2015_March
Event_id
(Partition)
Timestamp
(Sort)
Attribute1 …. Attribute N
Events_table_2015_Feburary
Event_id
(Partition)
Timestamp
(Sort)
Attribute1 …. Attribute N
Events_table_2015_January
Event_id
(Partition)
Timestamp
(Sort)
Attribute1 …. Attribute N
RCUs = 1000
WCUs = 1
RCUs = 10000
WCUs = 10000
RCUs = 100
WCUs = 1
RCUs = 10
WCUs = 1
Current table
Older tables
HotdataColddata
Don’t mix hot and cold data; archive cold data to Amazon S3
Use a table per time period
Precreate daily, weekly, monthly tables
Provision required throughput for current table
Writes go to the current table
Turn off (or reduce) throughput for older tables
Dealing with time series data
Product catalog
Popular items (read)
Partition 1
2000 RCUs
Partition K
2000 RCUs
Partition M
2000 RCUs
Partition 50
2000 RCU
Scaling bottlenecks
Product A Product B
Shoppers
ProductCatalog Table
SELECT Id, Description, ...
FROM ProductCatalog
WHERE Id="POPULAR_PRODUCT"
Partition 1 Partition 2
ProductCatalog Table
User
DynamoDB
User
Cache
popular items
SELECT Id, Description, ...
FROM ProductCatalog
WHERE Id="POPULAR_PRODUCT"
Messaging app
Large items
Filters vs. indexes
M:N Modeling—inbox and outbox
Messages
table
Messages app
David
SELECT *
FROM Messages
WHERE Recipient='David'
LIMIT 50
ORDER BY Date DESC
Inbox
SELECT *
FROM Messages
WHERE Sender ='David'
LIMIT 50
ORDER BY Date DESC
Outbox
Recipient Date Sender Message
David 2014-10-02 Bob …
… 48 more messages for David …
David 2014-10-03 Alice …
Alice 2014-09-28 Bob …
Alice 2014-10-01 Carol …
Large and small attributes mixed
(Many more messages)
David
Messages table
50 items × 256 KB each
Partition key Sort key
Large message bodies
Attachments
SELECT *
FROM Messages
WHERE Recipient='David'
LIMIT 50
ORDER BY Date DESC
Inbox
Computing inbox query cost
Items evaluated by query
Average item size
Conversion ratio
Eventually consistent reads
50 * 256KB * (1 RCU / 4 KB) * (1 / 2) = 1600 RCU
Recipient Date Sender Subject MsgId
David 2014-10-02 Bob Hi!… afed
David 2014-10-03 Alice RE: The… 3kf8
Alice 2014-09-28 Bob FW: Ok… 9d2b
Alice 2014-10-01 Carol Hi!... ct7r
Separate the bulk data
Inbox-GSI Messages table
MsgId Body
9d2b …
3kf8 …
ct7r …
afed …
David
1. Query inbox-GSI: 1 RCU
2. BatchGetItem messages: 1600 RCU
(50 separate items at 256 KB)
(50 sequential items at 128 bytes)
Uniformly distributes large item reads
Inbox GSI
Define which attributes to copy into the index
Outbox Sender
Outbox GSI
SELECT *
FROM Messages
WHERE Sender ='David'
LIMIT 50
ORDER BY Date DESC
Messaging app
Messages
table
David
Inbox
global secondary
index
Inbox
Outbox
global secondary
index
Outbox
Reduce one-to-many item sizes
Configure secondary index projections
Use GSIs to model M:N relationship
between sender and recipient
Distribute large items
Querying many large items at once
InboxMessagesOutbox
Multiplayer online gaming
Query filters vs.
composite key indexes
GameId Date Host Opponent Status
d9bl3 2014-10-02 David Alice DONE
72f49 2014-09-30 Alice Bob PENDING
o2pnb 2014-10-08 Bob Carol IN_PROGRESS
b932s 2014-10-03 Carol Bob PENDING
ef9ca 2014-10-03 David Bob IN_PROGRESS
Games table
Multivalue sorts and filters
Partition key
Query for incoming game requests
DynamoDB indexes provide partition and sort
What about queries for two equalities and a sort?
SELECT * FROM Game
WHERE Opponent='Bob‘
AND Status=‘PENDING'
ORDER BY Date DESC
(hash)
(range)
(?)
Secondary index
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
Approach 1: Query filter
BobPartition key Sort key
Secondary Index
Approach 1: Query filter
Bob
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
SELECT * FROM Game
WHERE Opponent='Bob'
ORDER BY Date DESC
FILTER ON Status='PENDING'
(filtered out)
Needle in a haystack
Bob
Send back less data “on the wire”
Simplify application code
Simple SQL-like expressions
• AND, OR, NOT, ()
Use query filter
Your index isn’t entirely selective
Approach 2: Composite key
StatusDate
DONE_2014-10-02
IN_PROGRESS_2014-10-08
IN_PROGRESS_2014-10-03
PENDING_2014-09-30
PENDING_2014-10-03
Status
DONE
IN_PROGRESS
IN_PROGRESS
PENDING
PENDING
Date
2014-10-02
2014-10-08
2014-10-03
2014-10-03
2014-09-30
+ =
Secondary Index
Approach 2: Composite key
Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Partition key Sort key
Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Secondary index
Approach 2: Composite key
Bob
SELECT * FROM Game
WHERE Opponent='Bob'
AND StatusDate BEGINS_WITH 'PENDING'
Needle in a sorted haystack
Bob
Sparse indexes
Id
(Partition)
User Game Score Date Award
1 Bob G1 1300 2012-12-23
2 Bob G1 1450 2012-12-23
3 Jay G1 1600 2012-12-24
4 Mary G1 2000 2012-10-24 Champ
5 Ryan G2 123 2012-03-10
6 Jones G2 345 2012-03-20
Game-scores-table
Award
(Partition)
Id User Score
Champ 4 Mary 2000
Award-GSI
Scan sparse GSIs
Concatenate attributes to form useful
secondary index keys
Take advantage of sparse indexes
Replace filter with indexes
You want to optimize a query as much
as possible
Status + Date
Real-time voting
Write-heavy items
Partition 1
1000 WCUs
Partition K
1000 WCUs
Partition M
1000 WCUs
Partition N
1000 WCUs
Votes Table
Candidate A Candidate B
Scaling bottlenecks
Voters
Provision 200,000 WCUs
Write sharding
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
Candidate A_7 Candidate B_8
Candidate A_6 Candidate A_8
Candidate A_5
Voter
Votes Table
Write sharding
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
Candidate A_7 Candidate B_8
UpdateItem: “CandidateA_” + rand(0, 10)
ADD 1 to Votes
Candidate A_6 Candidate A_8
Candidate A_5
Voter
Votes Table
Votes Table
Shard aggregation
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
Candidate A_5
Candidate A_6 Candidate A_8
Candidate A_7 Candidate B_8
Periodic
process
Candidate A
Total: 2.5M
1. Sum
2. Store Voter
Trade off read cost for write scalability
Consider throughput per partition key
Shard write-heavy partition keys
Your write workload is not horizontally
scalable
Event-driven applications and
DynamoDB Streams
Stream of updates to a table
Asynchronous
Exactly once
Strictly ordered
• Per item
Highly durable
• Scale with table
24-hour lifetime
Subsecond latency
DynamoDB Streams
View type Destination
Old image—before update Name = John, Destination = Mars
New image—after update Name = John, Destination = Pluto
Old and new images Name = John, Destination = Mars
Name = John, Destination = Pluto
Keys only Name = John
View types
UpdateItem (Name = John, Destination = Pluto)
Stream
Table
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Table
Shard 1
Shard 2
Shard 3
Shard 4
KCL
Worker
KCL
Worker
KCL
Worker
KCL
Worker
Amazon Kinesis Client
Library application
DynamoDB
client application
Updates
DynamoDB Streams and
Amazon Kinesis Client Library
DynamoDB Streams
Open source cross-region
replication library
Asia Pacific (Sydney) EU (Ireland) replica
US East (N. Virginia)
Cross-region replication
DynamoDB Streams and AWS Lambda
Triggers
Lambda function
Notify change
Derivative tables
Amazon CloudSearch
Amazon ElastiCache
Analytics with
DynamoDB Streams
Collect and de-dupe data in DynamoDB
Aggregate data in memory and flush periodically
Performing real-time aggregation and
analytics
Architecture
Reference architecture
Elastic event-driven applications
Thank you!

More Related Content

What's hot

What's hot (20)

Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
Amazon Relational Database Service Deep Dive
Amazon Relational Database Service Deep DiveAmazon Relational Database Service Deep Dive
Amazon Relational Database Service Deep Dive
 
ENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the Cloud
 
Webinar | Introduction to Amazon DynamoDB
Webinar | Introduction to Amazon DynamoDBWebinar | Introduction to Amazon DynamoDB
Webinar | Introduction to Amazon DynamoDB
 
Optimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsOptimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics Workloads
 
February 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDBFebruary 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDB
 
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal HealthSRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 
(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data
 
AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage a...
AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage a...AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage a...
AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage a...
 
SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon KinesisSRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
 
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
 
AWS re:Invent 2016: Storage State of the Union (STG201)
AWS re:Invent 2016: Storage State of the Union (STG201)AWS re:Invent 2016: Storage State of the Union (STG201)
AWS re:Invent 2016: Storage State of the Union (STG201)
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
 
(WRK302) Event-Driven Programming
(WRK302) Event-Driven Programming(WRK302) Event-Driven Programming
(WRK302) Event-Driven Programming
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics Workloads
 

Viewers also liked

Top 10 technical program manager interview questions and answers
Top 10 technical program manager interview questions and answersTop 10 technical program manager interview questions and answers
Top 10 technical program manager interview questions and answers
foreverlove251092
 

Viewers also liked (19)

Design Patterns using Amazon DynamoDB
 Design Patterns using Amazon DynamoDB Design Patterns using Amazon DynamoDB
Design Patterns using Amazon DynamoDB
 
Security & Compliance
Security & Compliance Security & Compliance
Security & Compliance
 
Running Lean Architectures
Running Lean ArchitecturesRunning Lean Architectures
Running Lean Architectures
 
Amazon Machine Learning #AWSLoft Berlin
Amazon Machine Learning #AWSLoft BerlinAmazon Machine Learning #AWSLoft Berlin
Amazon Machine Learning #AWSLoft Berlin
 
AWS Webcast - Dynamo DB
AWS Webcast - Dynamo DBAWS Webcast - Dynamo DB
AWS Webcast - Dynamo DB
 
Improving the innovation process with informal networks
Improving the innovation process with informal networksImproving the innovation process with informal networks
Improving the innovation process with informal networks
 
Partnership Model
Partnership ModelPartnership Model
Partnership Model
 
Deep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDBDeep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDB
 
Deep Dive on AWS Cloud Data Migration Services
Deep Dive on AWS Cloud Data Migration ServicesDeep Dive on AWS Cloud Data Migration Services
Deep Dive on AWS Cloud Data Migration Services
 
Top 10 technical program manager interview questions and answers
Top 10 technical program manager interview questions and answersTop 10 technical program manager interview questions and answers
Top 10 technical program manager interview questions and answers
 
AWS에 대해 가장 궁금했던 열가지 - 정우근 매니저:: AWS Cloud Track 1 Intro
AWS에 대해 가장 궁금했던 열가지 - 정우근 매니저:: AWS Cloud Track 1 IntroAWS에 대해 가장 궁금했던 열가지 - 정우근 매니저:: AWS Cloud Track 1 Intro
AWS에 대해 가장 궁금했던 열가지 - 정우근 매니저:: AWS Cloud Track 1 Intro
 
AWSome Day | Tech Track
AWSome Day | Tech TrackAWSome Day | Tech Track
AWSome Day | Tech Track
 
Amazon CloudFront Office Hour, “Using Amazon CloudFront with Amazon S3 & AWS ...
Amazon CloudFront Office Hour, “Using Amazon CloudFront with Amazon S3 & AWS ...Amazon CloudFront Office Hour, “Using Amazon CloudFront with Amazon S3 & AWS ...
Amazon CloudFront Office Hour, “Using Amazon CloudFront with Amazon S3 & AWS ...
 
Serverless Microservices
Serverless MicroservicesServerless Microservices
Serverless Microservices
 
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...
 
Introduction to Amazon Lightsail
Introduction to Amazon LightsailIntroduction to Amazon Lightsail
Introduction to Amazon Lightsail
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
 
AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)
 

Similar to Deep Dive on Amazon DynamoDB

Similar to Deep Dive on Amazon DynamoDB (20)

Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Amazon DynamoDB Design Workshop
Amazon DynamoDB Design WorkshopAmazon DynamoDB Design Workshop
Amazon DynamoDB Design Workshop
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
DynamoDB Design Workshop
DynamoDB Design WorkshopDynamoDB Design Workshop
DynamoDB Design Workshop
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
 
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive
 
Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討
 
Amazon Dynamo DB for Developers (김일호) - AWS DB Day
Amazon Dynamo DB for Developers (김일호) - AWS DB DayAmazon Dynamo DB for Developers (김일호) - AWS DB Day
Amazon Dynamo DB for Developers (김일호) - AWS DB Day
 
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
 
Deep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDBDeep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDB
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
 
Deep Dive - DynamoDB
Deep Dive - DynamoDBDeep Dive - DynamoDB
Deep Dive - DynamoDB
 
Intro to Amazon Redshift Spectrum: Quickly Query Exabytes of Data in S3 - Jun...
Intro to Amazon Redshift Spectrum: Quickly Query Exabytes of Data in S3 - Jun...Intro to Amazon Redshift Spectrum: Quickly Query Exabytes of Data in S3 - Jun...
Intro to Amazon Redshift Spectrum: Quickly Query Exabytes of Data in S3 - Jun...
 
Data Access Patterns
Data Access PatternsData Access Patterns
Data Access Patterns
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Data Collection and Storage
Data Collection and StorageData Collection and Storage
Data Collection and Storage
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Deep Dive on Amazon DynamoDB

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rick Houlihan, Principal TPM – DBS NoSQL August 11, 2016 Deep Dive on Amazon DynamoDB
  • 2. Agenda • Why NoSQL? • Scaling Amazon DynamoDB • Data modeling • Design patterns and best practices • Event-driven applications and DynamoDB Streams
  • 3. Timeline of database technology DataPressure
  • 4. Data volume since 2010 DataVolume Historical Current  90 percent of stored data generated in last 2 years  1 terabyte of data in 2010 equals 6.5 petabytes today  Linear correlation between data pressure and technical innovation  No reason these trends will not continue over time
  • 5. Technology adoption and the hype curve
  • 6. Why NoSQL? Optimized for storage Optimized for compute Normalized/relational Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP at scale SQL NoSQL
  • 7. Amazon DynamoDB Document or Key-Value Scales to Any WorkloadFully Managed NoSQL Access Control Event Driven ProgrammingFast and Consistent
  • 9. Scaling Throughput Provision any amount of throughput to a table Size Add any number of items to a table • Max item size is 400 KB • Local secondary indexes (LSIs) limit the number of range keys due to 10 GB limit Scaling Achieved through partitioning
  • 10. Throughput Provisioned at the table level • Write capacity units (WCUs) are measured in 1 KB per second • Read capacity units (RCUs) are measured in 4 KB per second • RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads Read and write throughput limits are independent WCURCU
  • 11. Partitioning math In the future, these details might change… Number of Partitions By capacity (Total RCU / 3000) + (Total WCU / 1000) By size Total Size / 10 GB Total partitions CEILING (MAX (Capacity, Size))
  • 12. Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500 RCUs per partition = 5000/3 = 1666.67 WCUs per partition = 500/3 = 166.67 Data/partition = 10/3 = 3.33 GB RCUs and WCUs are uniformly spread across partitions Number of Partitions By Capacity (5000 / 3000) + (500 / 1000) = 2.17 By size 8 / 10 = 0.8 Total partitions CEILING (MAX (2.17, 0.8)) = 3
  • 13. What causes throttling? If sustained throughput goes beyond provisioned throughput per partition. Nonuniform workloads • Hot keys/hot partitions • Very large bursts Mixing hot data with cold data • Use a table per time period From the example before: • Table created with 5,000 RCUs, 500 WCUs • RCUs per partition = 1666.67 • WCUs per partition = 166.67 • If sustained throughput > (1,666 RCUs or 166 WCUs) per key or partition, DynamoDB might throttle requests • Solution: increase provisioned throughput
  • 14. What bad NoSQL looks like… Partition Time Heat
  • 15. Getting the most out of DynamoDB throughput “To get the most out of DynamoDB throughput, create tables where the hash key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.” —DynamoDB Developer Guide Space: access is evenly spread over the key space Time: requests arrive evenly spaced in time
  • 18. 1:1 relationships or key-values Use a table or global secondary index (GSI) with an alternate partition key Use GetItem or BatchGetItem API action Example: Given an SSN or license number, get attributes. Users Table Partition key Attributes SSN = 123-45-6789 Email = johndoe@nowhere.com, License = TDL25478134 SSN = 987-65-4321 Email = maryfowler@somewhere.com, License = TDL78309234 Users-Email-GSI Partition key Attributes License = TDL78309234 Email = maryfowler@somewhere.com, SSN = 987-65-4321 License = TDL25478134 Email = johndoe@nowhere.com, SSN = 123-45-6789
  • 19. 1:N relationships or parent-children Use a table or GSI with partition and sort key Use Query API Example: Given a device, find all readings between epoch X, Y Device-measurements Partition Key Sort key Attributes DeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90 DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90
  • 20. N:M relationships Use a table and GSI with partition and sort key elements switched Use Query API Example: Given a user, find all games. Or given a game, find all users User-Games-Table Partition Key Sort key UserId = bob GameId = Game1 UserId = fred GameId = Game2 UserId = bob GameId = Game3 Game-Users-GSI Partition Key Sort key GameId = Game1 UserId = bob GameId = Game2 UserId = fred GameId = Game3 UserId = bob
  • 22. SQL vs. NoSQL access pattern
  • 23. Hierarchical data structures as items Use composite sort key to define a hierarchy Highly selective result sets with sort queries Index anything, scales to any size Primary Key Attributes ProductID type Items 1 bookID title author genre publisher datePublished ISBN Ringworld Larry Niven Science Fiction Ballantine Oct-70 0-345-02046-4 2 albumID title artist genre label studio released producer Dark Side of the Moon Pink Floyd Progressive Rock Harvest Abbey Road 3/1/73 Pink Floyd 2 albumID:trackID title length music vocals Speak to Me 1:30 Mason Instrumental 2 albumID:trackID title length music vocals Breathe 2:43 Waters, Gilmour, Wright Gilmour 2 albumID:trackID title length music vocals On the Run 3:30 Gilmour, Waters Instrumental 3 movieID title genre writer producer Idiocracy Scifi Comedy Mike Judge 20th Century Fox 3 movieID:actorID name character image Luke Wilson Joe Bowers img2.jpg 3 movieID:actorID name character image Maya Rudolph Rita img3.jpg 3 movieID:actorID name character image Dax Shepard Frito Pendejo img1.jpg
  • 24. … or as documents (JSON) JSON data types (M, L, BOOL, NULL) Document SDKs available Indexing only by using DynamoDB Streams or AWS Lambda 400 KB maximum item size (limits hierarchical data structure) Primary Key Attributes ProductID Items 1 id title author genre publisher datePublished ISBN bookID Ringworld Larry Niven Science Fiction Ballantine Oct-70 0-345-02046-4 2 id title artist genre Attributes albumID Dark Side of the Moon Pink Floyd Progressive Rock { label:"Harvest", studio: "Abbey Road", published: "3/1/73", producer: "Pink Floyd", tracks: [{title: "Speak to Me", length: "1:30", music: "Mason", vocals: "Instrumental"},{title: ”Breathe", length: ”2:43", music: ”Waters, Gilmour, Wright", vocals: ”Gilmour"},{title: ”On the Run", length: “3:30", music: ”Gilmour, Waters", vocals: "Instrumental"}]} 3 id title genre writer Attributes movieID Idiocracy Scifi Comedy Mike Judge { producer: "20th Century Fox", actors: [{ name: "Luke Wilson", dob: "9/21/71", character: "Joe Bowers", image: "img2.jpg"},{ name: "Maya Rudolph", dob: "7/27/72", character: "Rita", image: "img1.jpg"},{ name: "Dax Shepard", dob: "1/2/75", character: "Frito Pendejo", image: "img3.jpg"}]
  • 25. Multiversion concurrency control • Use a metadata items to manage current object collection SET #ItemList.items = list_append(:newList, #ItemList.items ) • Remove old items or add version/state to sort keys after writes Use composite key queries to get specific versions • Readers can choose how to handle “fat” reads
  • 26. Design patterns and best practices
  • 28. Time series tables Events_table_2015_April Event_id (Partition) Timestamp (Sort) Attribute1 …. Attribute N Events_table_2015_March Event_id (Partition) Timestamp (Sort) Attribute1 …. Attribute N Events_table_2015_Feburary Event_id (Partition) Timestamp (Sort) Attribute1 …. Attribute N Events_table_2015_January Event_id (Partition) Timestamp (Sort) Attribute1 …. Attribute N RCUs = 1000 WCUs = 1 RCUs = 10000 WCUs = 10000 RCUs = 100 WCUs = 1 RCUs = 10 WCUs = 1 Current table Older tables HotdataColddata Don’t mix hot and cold data; archive cold data to Amazon S3
  • 29. Use a table per time period Precreate daily, weekly, monthly tables Provision required throughput for current table Writes go to the current table Turn off (or reduce) throughput for older tables Dealing with time series data
  • 31. Partition 1 2000 RCUs Partition K 2000 RCUs Partition M 2000 RCUs Partition 50 2000 RCU Scaling bottlenecks Product A Product B Shoppers ProductCatalog Table SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"
  • 32.
  • 33. Partition 1 Partition 2 ProductCatalog Table User DynamoDB User Cache popular items SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"
  • 34.
  • 35. Messaging app Large items Filters vs. indexes M:N Modeling—inbox and outbox
  • 36. Messages table Messages app David SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC Inbox SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC Outbox
  • 37. Recipient Date Sender Message David 2014-10-02 Bob … … 48 more messages for David … David 2014-10-03 Alice … Alice 2014-09-28 Bob … Alice 2014-10-01 Carol … Large and small attributes mixed (Many more messages) David Messages table 50 items × 256 KB each Partition key Sort key Large message bodies Attachments SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC Inbox
  • 38. Computing inbox query cost Items evaluated by query Average item size Conversion ratio Eventually consistent reads 50 * 256KB * (1 RCU / 4 KB) * (1 / 2) = 1600 RCU
  • 39. Recipient Date Sender Subject MsgId David 2014-10-02 Bob Hi!… afed David 2014-10-03 Alice RE: The… 3kf8 Alice 2014-09-28 Bob FW: Ok… 9d2b Alice 2014-10-01 Carol Hi!... ct7r Separate the bulk data Inbox-GSI Messages table MsgId Body 9d2b … 3kf8 … ct7r … afed … David 1. Query inbox-GSI: 1 RCU 2. BatchGetItem messages: 1600 RCU (50 separate items at 256 KB) (50 sequential items at 128 bytes) Uniformly distributes large item reads
  • 40. Inbox GSI Define which attributes to copy into the index
  • 41. Outbox Sender Outbox GSI SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC
  • 43. Reduce one-to-many item sizes Configure secondary index projections Use GSIs to model M:N relationship between sender and recipient Distribute large items Querying many large items at once InboxMessagesOutbox
  • 44. Multiplayer online gaming Query filters vs. composite key indexes
  • 45. GameId Date Host Opponent Status d9bl3 2014-10-02 David Alice DONE 72f49 2014-09-30 Alice Bob PENDING o2pnb 2014-10-08 Bob Carol IN_PROGRESS b932s 2014-10-03 Carol Bob PENDING ef9ca 2014-10-03 David Bob IN_PROGRESS Games table Multivalue sorts and filters Partition key
  • 46. Query for incoming game requests DynamoDB indexes provide partition and sort What about queries for two equalities and a sort? SELECT * FROM Game WHERE Opponent='Bob‘ AND Status=‘PENDING' ORDER BY Date DESC (hash) (range) (?)
  • 47. Secondary index Opponent Date GameId Status Host Alice 2014-10-02 d9bl3 DONE David Carol 2014-10-08 o2pnb IN_PROGRESS Bob Bob 2014-09-30 72f49 PENDING Alice Bob 2014-10-03 b932s PENDING Carol Bob 2014-10-03 ef9ca IN_PROGRESS David Approach 1: Query filter BobPartition key Sort key
  • 48. Secondary Index Approach 1: Query filter Bob Opponent Date GameId Status Host Alice 2014-10-02 d9bl3 DONE David Carol 2014-10-08 o2pnb IN_PROGRESS Bob Bob 2014-09-30 72f49 PENDING Alice Bob 2014-10-03 b932s PENDING Carol Bob 2014-10-03 ef9ca IN_PROGRESS David SELECT * FROM Game WHERE Opponent='Bob' ORDER BY Date DESC FILTER ON Status='PENDING' (filtered out)
  • 49. Needle in a haystack Bob
  • 50. Send back less data “on the wire” Simplify application code Simple SQL-like expressions • AND, OR, NOT, () Use query filter Your index isn’t entirely selective
  • 51. Approach 2: Composite key StatusDate DONE_2014-10-02 IN_PROGRESS_2014-10-08 IN_PROGRESS_2014-10-03 PENDING_2014-09-30 PENDING_2014-10-03 Status DONE IN_PROGRESS IN_PROGRESS PENDING PENDING Date 2014-10-02 2014-10-08 2014-10-03 2014-10-03 2014-09-30 + =
  • 52. Secondary Index Approach 2: Composite key Opponent StatusDate GameId Host Alice DONE_2014-10-02 d9bl3 David Carol IN_PROGRESS_2014-10-08 o2pnb Bob Bob IN_PROGRESS_2014-10-03 ef9ca David Bob PENDING_2014-09-30 72f49 Alice Bob PENDING_2014-10-03 b932s Carol Partition key Sort key
  • 53. Opponent StatusDate GameId Host Alice DONE_2014-10-02 d9bl3 David Carol IN_PROGRESS_2014-10-08 o2pnb Bob Bob IN_PROGRESS_2014-10-03 ef9ca David Bob PENDING_2014-09-30 72f49 Alice Bob PENDING_2014-10-03 b932s Carol Secondary index Approach 2: Composite key Bob SELECT * FROM Game WHERE Opponent='Bob' AND StatusDate BEGINS_WITH 'PENDING'
  • 54. Needle in a sorted haystack Bob
  • 55. Sparse indexes Id (Partition) User Game Score Date Award 1 Bob G1 1300 2012-12-23 2 Bob G1 1450 2012-12-23 3 Jay G1 1600 2012-12-24 4 Mary G1 2000 2012-10-24 Champ 5 Ryan G2 123 2012-03-10 6 Jones G2 345 2012-03-20 Game-scores-table Award (Partition) Id User Score Champ 4 Mary 2000 Award-GSI Scan sparse GSIs
  • 56. Concatenate attributes to form useful secondary index keys Take advantage of sparse indexes Replace filter with indexes You want to optimize a query as much as possible Status + Date
  • 58. Partition 1 1000 WCUs Partition K 1000 WCUs Partition M 1000 WCUs Partition N 1000 WCUs Votes Table Candidate A Candidate B Scaling bottlenecks Voters Provision 200,000 WCUs
  • 59. Write sharding Candidate A_2 Candidate B_1 Candidate B_2 Candidate B_3 Candidate B_5 Candidate B_4 Candidate B_7 Candidate B_6 Candidate A_1 Candidate A_3 Candidate A_4 Candidate A_7 Candidate B_8 Candidate A_6 Candidate A_8 Candidate A_5 Voter Votes Table
  • 60. Write sharding Candidate A_2 Candidate B_1 Candidate B_2 Candidate B_3 Candidate B_5 Candidate B_4 Candidate B_7 Candidate B_6 Candidate A_1 Candidate A_3 Candidate A_4 Candidate A_7 Candidate B_8 UpdateItem: “CandidateA_” + rand(0, 10) ADD 1 to Votes Candidate A_6 Candidate A_8 Candidate A_5 Voter Votes Table
  • 61. Votes Table Shard aggregation Candidate A_2 Candidate B_1 Candidate B_2 Candidate B_3 Candidate B_5 Candidate B_4 Candidate B_7 Candidate B_6 Candidate A_1 Candidate A_3 Candidate A_4 Candidate A_5 Candidate A_6 Candidate A_8 Candidate A_7 Candidate B_8 Periodic process Candidate A Total: 2.5M 1. Sum 2. Store Voter
  • 62. Trade off read cost for write scalability Consider throughput per partition key Shard write-heavy partition keys Your write workload is not horizontally scalable
  • 64. Stream of updates to a table Asynchronous Exactly once Strictly ordered • Per item Highly durable • Scale with table 24-hour lifetime Subsecond latency DynamoDB Streams
  • 65. View type Destination Old image—before update Name = John, Destination = Mars New image—after update Name = John, Destination = Pluto Old and new images Name = John, Destination = Mars Name = John, Destination = Pluto Keys only Name = John View types UpdateItem (Name = John, Destination = Pluto)
  • 66. Stream Table Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Table Shard 1 Shard 2 Shard 3 Shard 4 KCL Worker KCL Worker KCL Worker KCL Worker Amazon Kinesis Client Library application DynamoDB client application Updates DynamoDB Streams and Amazon Kinesis Client Library
  • 67. DynamoDB Streams Open source cross-region replication library Asia Pacific (Sydney) EU (Ireland) replica US East (N. Virginia) Cross-region replication
  • 68. DynamoDB Streams and AWS Lambda
  • 69. Triggers Lambda function Notify change Derivative tables Amazon CloudSearch Amazon ElastiCache
  • 70. Analytics with DynamoDB Streams Collect and de-dupe data in DynamoDB Aggregate data in memory and flush periodically Performing real-time aggregation and analytics