Dynamodb

DynamoDB: Data Example
userId date value unlockedAchievments
hadr-fb 18-07-2012 72 [’10 days’, ‘2 levels day’]
hadr-fb 19-07-2012 1 None
hadr-fb 20-07-2012 56789 [‘top 10 progress’]

Table: ‘Waldo-Scores’

Id platform Name JoinDate Score
hadr fb Hadrien 31-02-2011 10 457
hadr G+ Hadrien 18-07-2012 357
pior fb Pior 12-12-2012 18 951

Table: ‘Players’

Data types (Lean. . . )

Types

single
string (utf-8)
number (entre 10-128 et 10+126 )
set
string (utf-8)
number

Constraints

no “Embeded Documents”
no complex types (dates, . . . )

Dimensionning 1/2: Big picture

Units

acces/s ∗ roundUp(kb) ∗ item
provisionning
updates are. . . constraining

Storage

tables are “elastic”
64KB max per item
overhead = 100o per item

Dimensionning 2/2: Traps and constraints
TRAPS:

Units are divided among each partition.
Bigger tables often means higher throughput. Divide tables ?

CONSTRAINTS for throughput:

absolute
min 5
max 10 000
1 single table in UPDATING state
increase
min 10%
max 100%
decrease
min 10%
max once a day

Integrated Service 1/3: IAM

API level
table level (except for “ListTables”)

Example: “Fair” Scores table use

{
"Statement":[{
"Effect":"Allow",
"Action":["DynamoDB:DeleteItem", "DynamoDB:PutItem",
"DynamoDB:UpdateItem", "DynamoDB:GetItem",
"DynamoDB:Query"],
"Resource":
"arn:aws:DynamoDB:<region>:<account>:table/Scores"
}]
}

Integrated Service 2/3: CloudWatch
Metrics:

SuccessfulRequestLatency
UserErrors
SystemErrors
ThrottledRequests
ConsumedReadCapacityUnits
ConsumedWriteCapacityUnits
ReturnedItemCount

Metric’s context

Table
Operation ({Put, Delete, Update, Get, BatchGet}Item, Scan,
Query)

Integrated Service 3/3: EMR

out of the scope of this presentation
basically, HIVE integrated with DynamoDB => HiveQL

use cases:

custom index generation
export to S3 (backup, data removal)
data analysis / aggregation

Data access 1/3: GetItem

Fastest: primary key(s)
0-1 item
Cost = 1 unit

Example : ‘Hadrien’ Player of ‘fb’ platform

table = conn.get_table(’Players’)
item = table.get_item(
hash_key=’hadr’,
range_key=’fb’
)

Data access 2/3: Query

Fast
primary key
range key conditions =, <, >, <=, >=, startsWith
0+ item(s)
Cost = 1 unit per returned item

Example : All ‘Waldo-Scores’ of ‘hadr-fb’ Player

table = conn.get_table(’Waldo-Scores’)
hash_key=’hadr-fb’,
#range_key_condition=
)

Data access 3/3: Scan

Slooooow
ﬁlter on any key
tests ALL the table !
0+ item(s)
Cost = 1/2 unit for each parsed KB ! => Starvations
Use case: get a full (small) table. Ex: ‘powerups’

Example : All days where ‘hadr-*’ did better than 100

table = conn.get_table(’Waldo-Scores’)
scan_filter={
’userId’: BEGINSWITH(’hadr-’),
’value’: GT(100)
})

Performance considerations: non indexed data 1/2

De-normalisation

Ex: Waldo and Players table :)
big picture: data duplication to ﬁt the
view point
need

Performance considerations: non indexed data 2/2

Scan

sloooooow (sequential)
(bad) unit consumption (sequential)

EMR

scales (less slow :p)
(better) units consumption (parallele)

TL;DR
Index your data !

Eventual vs strong consitence

write => propagation ∼ 1s
read => may not be up to date . . .

Consistence Applications Cost (Units) performance
strong critical 1 per KB good
eventual aware 1/2 per KB maximal

Critical/speciﬁc applications

Redundancy/backup

managed => no need
“∼ Snapshot” => EMR + S3

∼ Transactions

conditional operations (idempotent)
atomic counter (idempotent BUT strong consistence)

API 1/3: Read

Method Consistence Description Returns
GetItem eventual/strong load by key 0-1 item
BatchGetItem eventual/strong same // 0-100 item, 0-1MB
Query eventual/strong rangeKey filter 0+ item, 0-1MB
Scan eventual any key filter 0+ item, 0-1MB

rule: 0-1 filter / eligible key
unprocessed => ‘UnprocessedKeys’, ‘LastEvaluatedKey’
consumed units => ‘ConsumedCapacityUnits’
enforce strong consistence => ‘ConsistentRead’

API 2/3: Edit

Method Consistence Condition Changes
PutItem create-replace yes 1 item
DeleteItem supprime yes 1 item, 0-1MB
BatchWriteItem create-up-del no 1-25 item
UpdateItem create-up-del yes 1+ ﬁeld, 1 item

not processed / failure => ‘UnprocessedItems’
condition failed => ‘ConditionalCheckFailed’

API 3/3: Structure

Method Asynchronous Description
CreateTable yes Create table - provision units
DeleteTable yes self explanatory
DescribeTable no Read size, status, throughput
ListTables no Get tables starting with “. . . ”
UpdateTables yes Update provisions

“DELETING” table might answer requests until deleted

TL;DR Let’s make it short :)

Amazon
scalable
fully integrated
Constraints
throughput provisioning
index matters

Dynamodb

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (7)

Similaire à Dynamodb

Similaire à Dynamodb (20)

Dernier

Dernier (20)

Dynamodb