DAT302 Under the Covers of Amazon DynamoDB - AWS re: Invent 2012

Under the Covers of Amazon DynamoDB

Matt Wood, Chief Data Scientist

Two decisions + three clicks
= ready for use

Level of throughput
Primary keys

Two decisions + three clicks
= ready for use

Provisioned
throughput

Amazon DynamoDB

Provisioned
throughput

Data patterns

Amazon DynamoDB

DynamoDB is a managed NoSQL
database service

Store and retrieve any amount of data
Serve any level of request traffic

Without the operational burden

Consistent, predictable performance

Single digit millisecond latency
Backed on solid-state drives

Flexible data model

Key/attribute pairs. No schema required.
Easy to create. Easy to adjust.

Seamless scalability

No table size limits. Unlimited storage.
No downtime.

Durable

Consistent, disk only writes
Replication across data centers
and availability zones

Without the operational burden

Focus on your app

Provisioned throughput

Reserve IOPS for reads and writes
Scale up for down at any time

Pay per capacity unit

Priced per hour of provisioned throughput

Write throughput

Size of item x writes per second
$0.01 for 10 write units

Consistent writes

Atomic increment and decrement
Optimistic concurrency control: conditional writes

Transactions

Item level transactions only
Puts, updates and deletes are ACID

Strong or eventual consitency

Read throughput


Read throughput

Provisioned units = size of item x reads per second
$0.01 per hour for 50 units

Strong or eventual consistency

Read throughput

Provisioned units = size of item x reads per second
2

$0.01 per hour for 100 units


Read throughput

Same latency expectations
Mix and match at ‘read time’

Provisioned throughput is
managed by DynamoDB

Data is partitioned and managed
by DynamoDB

Achieving full provisioned throughput
requires a uniform workload

The DynamoDB Uniform Workload

DynamoDB divides table data in to multiple partitions

Data is distributed primarily by primary key

Provisioned throughput is divided evenly across partitions

The DynamoDB Uniform Workload

To achieve and maintain full provisioned throughput
for a table, spread the workload evenly
across primary keys

Non-uniform workloads

Some requests might be throttled,
even at high levels of provisioned throughput

Model data for a uniform workload

DynamoDB semantics

Tables, items and attributes

date = 2012-05-16-09-00-
id = 100 10 total = 25.00

date = 2012-05-15-15-00-
id = 101 11 total = 35.00

date = 2012-05-16-12-00-
id = 101 10 total = 100.00

date = 2012-03-20-18-23-
id = 102 10 total = 20.00

Table

date = 2012-05-16-09-00-
id = 100 10 total = 25.00

date = 2012-05-15-15-00-
id = 101 11 total = 35.00

date = 2012-05-16-12-00-
id = 101 10 total = 100.00

date = 2012-03-20-18-23-
id = 102 10 total = 20.00

date = 2012-05-16-09-00-
id = 100 10 total = 25.00

date = 2012-05-15-15-00-
id = 101 11 total = 35.00 Item

date = 2012-05-16-12-00-
id = 101 10 total = 100.00

date = 2012-03-20-18-23-
id = 102 10 total = 20.00

date = 2012-05-16-09-00-
id = 100 10 total = 25.00

date = 2012-05-15-15-00-
id = 101 11 total = 35.00 Attribute

date = 2012-05-16-12-00-
id = 101 10 total = 100.00

date = 2012-03-20-18-23-
id = 102 10 total = 20.00

Items are indexed by primary key

Single hash keys and composite range keys

Hash key

date = 2012-05-16-09-00-
id = 100 10 total = 25.00

date = 2012-05-15-15-00-
id = 101 11 total = 35.00

date = 2012-05-16-12-00-
id = 101 10 total = 100.00

date = 2012-03-20-18-23-
id = 102 10 total = 20.00

Range key

date = 2012-05-16-09-00-
id = 100 10 total = 25.00

date = 2012-05-15-15-00-
id = 101 11 total = 35.00

date = 2012-05-16-12-00-
id = 101 10 total = 100.00

date = 2012-03-20-18-23-
id = 102 10 total = 20.00

Items are retrieved by primary key

Range keys for queries

For example: all items for November

Relationships are not hard coded,
but can be modeled

Players
user_id = location = joined =
mza Cambridge 2011-07-04

jeffbarr Seattle 2012-01-20

werner Worldwide 2011-05-15

Players



Scores
user_id = game = score =
mza angry-birds 11,000

user_id = game = score =
mza tetris 1,223,000

user_id = location = score =
werner bejewelled 55,000

Players



Scores Leader boards
user_id = game = score = game = score = user_id =
mza angry-birds 11,000 angry-birds 11,000 mza
mza tetris 1,223,000 tetris 1,223,000 mza
user_id = location = score = game = score = user_id =
werner bejewelled 55,000 tetris 9,000,000 jeffbarr

Players
mza Cambridge 2011-07-04 Scores by user



Players
mza Cambridge 2011-07-04 High scores by game



NoSQL data modeling for maximal
provisioned throughput

Distinct values for hash keys

Hash key elements should have a high
number of distinct values

Lots of unique user IDs: workload well distributed
user_id = first_name = last_name =
mza Matt Wood

jeffbarr Jeff Barr

werner Werner Vogels

mattfox Matt Fox

... ... ...

Limited response codes: workload poorly distributed
status = date =
200 2012-04-01-00-00-01

status = date =
404 2012-04-01-00-00-01

status date =
404 2012-04-01-00-00-01

status = date =
404 2012-04-01-00-00-01

NYT faбrik

AWS re:Invent – November 2012

Andrew Canaday & Michael Laing
New York Times Digital

What we’ll cover

 faбrik overview

 Getting more out of DynamoDB with python/boto
– More throughput / provisioned capacity
– Across more endpoints / table
– More reliably and controllably

60

Frank McCloud: “He wants more, don't you, Rocco?”

Johnny Rocco: “Yeah. That's it. More. That's right! I want more!”

James Temple: “Will you ever get enough?”

Frank McCloud: “Will you, Rocco?”

Johnny Rocco: “Well, I never have. No, I guess I won't.”

61

Takeaways

 Messaging infrastructure is cool (again)

 Old dogs have tricks you can apply
– The Internet is your friend
– BUT: much good computer science was done prior
– HENCE: not so readily findable

 Boto is great – clone and contribute!

62

NYT Mission

Enhance society by creating, collecting and distributing high quality
news, information and entertainment

- Distributing: publish / subscribe
- Collecting: gather / analyze
- High Quality: fast, reliable, accurate

63

faбrik

 Asynchronous Messaging Framework

 For client devices as well as our apps

 Enabled by:
– Websockets
– Robust message handling software
– Amazon Web Services

 Focusing on simple, common services

64

Typical Web Architecture

 Clients interact with front-end via load balancers

 Front-end makes requests to back-end on behalf of client

 Bottlenecks abound

 Information transfer is initiated by client

65

Typical Request Flow

Load Balancer

Client

… Front End

… API

… Data

Typical Response Flow

Load Balancer

Client

… Front End

… API

… Data

faбrik Web Architecture

 Clients interact with the nearest “App Buddy” front-end

 The “App Buddy” is connected to the “Bad Rabbit” backbone

 The “Bad Rabbit” backbone is clustered regionally and federated globally

 NYT content producers connect directly to the backbone

 Information flow is bidirectional and event-driven

68

faбrik Information Flow

Client
Client Client

NYT Globally distributed
Internal “faбrik ” layer

69

faбrik – basic

App

Message Broker

App

App

70

faбrik – basic
Amazon Web Services

App

Message Broker

• EC2
• S3
App
• Identity & Access
App Mgt
• DynamoDB
• Route 53
…

71

faбrik – basic++

App Service
Buddy Buddy

Message Broker
“Retail”

Message Broker

“Wholesale”
Service
Buddy
OtherAp
p

72

faбrik: Current Implementation

 Open source:
– Erlang/OTP 14B04 – Sockjs (websockets +)
– RabbitMQ 2.8.7/3.pre – Python 2.6/2.7
– Nodejs 8.xx – ZeroMQ

 Automated deployment using CloudFormation

 DynamoDB & S3 for persistence

73

faбrik – active/active cluster
Region Wherever

Zone ‘a’ Zone ‘b’

Service Service
Buddy Buddy
‘a’ ‘b’

74

faбrik – active/active cluster
Region Wherever

Zone ‘a’ Zone ‘b’

Service Service
Buddy Buddy
‘a’ ‘b’

75

So why DynamoDB?

 faбrik services are reliable but stateless (mostly)

 A happy faбrik has short queues (measurable by the way)

 So persist everything as rapidly as possible (enter DynamoDB)

 Plus we want to gather & analyze
– Pulse: Map / Reduce, rapid cycle
– Longitudinal analysis
– Complex Event Processing in parallel (maybe)

Note: the faбrik is asynchronous and facilitates parallelization

80

DynamoDB requirements

 Store all messages crossing each ‘virtual host’
Note: think of a ‘virtual host’ as a horizontal band of related, reliable
services/endpoints across zones/instances in a region
 Store log messages for all application and system instances
 Facilitate ‘burst’ loads as well as steady state
 Support gather / analyze for all of the above
 Generational storage: DDB to S3 to Glacier (with some weeding)
 Fairly allocate resources among many competing endpoints

81

Conventional wisdom…

Uh oh – we have an unpredictable mix of all these…

82

More conventional wisdom…

“In addition to simple retries, we recommend using an exponential backoff algorithm
for better flow control. The concept behind exponential backoff is to use progressively
longer waits between retries for consecutive error responses. For example, up to 50
milliseconds before the first retry, up to 100 milliseconds before the second, up to 2400
milliseconds before third, and so on. However, after a minute, if the request has not
succeeded, the problem might be the request size exceeding your provisioned
throughput, and not the request rate. Set the maximum number of retries to stop around
one minute. If the request is not successful, investigate your provisioned throughput
options.” [i.e. increase provisioned throughput – hmmmm…]

83

So…

 We would have to provision for peaks

 Exponential backoff would give us about a 1 minute buffer

 But! The faбrik does buffering and we can monitor queue lengths

 Plus we have asynchronous event scheduling/handling facilities built in…

84

First strategy

 With node.js, asynchronously blast all requests at dynamo, reschedule exponentially
based on backpressure

 This worked pretty well!
– * Dynamo would deliver about 3 times stated capacity in bursts
– Nothing got lost
– Converged reasonably onto table capacity
 But…
– Problems exerting backpressure on the faбrik from node.js… hence requests could get scheduled
WAY into el futuro… and WAY out of order
– Competition among endpoints was ‘unfair’ and fostered convergence problems

85

Current strategy

 Be smarter, look for similar patterns and tested solutions, plus select tools that give
the right level of control

 Old dog:
– “I remember when TCP was new and throughput was not very high…”
(time passes)
– “The ‘ThroughputExceeded’ backpressure from DynamoDB is sort of like TCP backpressure…”
(more time passes)
– “Perhaps we could leverage that thought by applying the research and practices that have
improved TCP etc. to our use of DynamoDB.”
(time for a nap)

86

Current strategy

 Be smarter, look for similar patterns and tested solutions, plus select tools that give
the right level of control

 Token Bucket (circa 1986) for traffic shaping
“…an algorithm used in packet switched computer networks and telecommunications networks to
check that data transmissions conform to defined limits on bandwidth and burstiness.” – Wikipedia
 Additive Increase/Multiplicative Decrease (AIMD)
“…a feedback control algorithm best known for its use in TCP congestion control…combines linear
growth of the congestion window with an exponential reduction when congestion takes place…flows
will eventually converge to use equal amounts of a contended link.” – Wikipedia
 Explicit Congestion Notification (ECN) etc. etc.

87

Current strategy

 Be smarter, look for similar patterns and tested solutions, plus select reliable tools
that give the right level of control

 Tools:
– Use python to get a more mature and lower level event-driven interface (pika) to RabbitMQ –
easier to exert backpressure on the message source
– Use boto to get a mature interface to DynamoDB that can be easily ‘tweaked’ to give better
information about backpressure from DynamoDB (ThroughputExceeded exception)
– Use python’s concurrent futures to easily add asynchronous capability to boto, making use of
boto’s connection pooling

88

Managed Access to DynamoDB

89

Placeholder

 Here we show some code, describe the testing methodology briefly, and show
generated results.

90

Thank you!

matthew@amazon.com
@mza

We are sincerely eager to
hear your FEEDBACK on this
presentation and on re:Invent.

Please fill out an evaluation
form when you have a
chance.

DAT302 Under the Covers of Amazon DynamoDB - AWS re: Invent 2012

Recommandé

Recommandé

Contenu connexe

Similaire à DAT302 Under the Covers of Amazon DynamoDB - AWS re: Invent 2012

Similaire à DAT302 Under the Covers of Amazon DynamoDB - AWS re: Invent 2012 (12)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

DAT302 Under the Covers of Amazon DynamoDB - AWS re: Invent 2012