SlideShare une entreprise Scribd logo
1  sur  56
Télécharger pour lire hors ligne
Under the covers
with Dynamo DB
Ian Meyers
IP Expo 2013
RDS

Dynamo
DB

Redshift

Deployment & Administration
App Services
Fully Managed, Provisioned throughput NoSQL
Compute

Storage

Database

Networking
AWS Global Infrastructure

database
Fast, predictable, configurable performance
Fully distributed, fault tolerant HA architecture
Integration with EMR & Hive
Consistent, predictable performance.
Single digit millisecond latency.
Backed by solid-state drives.
Flexible data model.
Key/attribute pairs. No schema required.

Easy to create. Easy to adjust.
Seamless scalability.
No table size limits. Unlimited storage.
No downtime.
Durable.
Consistent, disk only writes.
Replication across data centers and availability zones.
Without the operational burden.
No Cluster to Manage
No HA to Manage
Consistent writes.
Atomic increment and decrement.
Optimistic concurrency control: conditional writes.
Transactions.
Native item level transactions only.
Puts, updates and deletes are ACID.
Transaction API for Java.
Three decisions + three clicks
= ready for use
Level of throughput
Primary keys
Secondary Indexes

Three decisions + three clicks
= ready for use
Level of throughput
Primary keys
Secondary Indexes

Three decisions + three clicks
= ready for use
Provisioned throughput.
Reserve IOPS for reads and writes.
Scale up for down at any time.
Pay per capacity unit.
Priced per hour of provisioned throughput.
Write throughput.
Size of item x writes per second

$0.0065 for 10 write units
Strong or eventual consistency

Read throughput.
Read data will reflect all previous transactions.
$0.0065 per hour for 50 units.
Strong or eventual consistency

Read throughput.
Read data may reflect old values.
$0.0065 per hour for 100 units.
Strong or eventual consistency

Read throughput.

Same latency expectations.
Mix and match at ‘read time’.
Provisioned throughput is
managed by DynamoDB.
Data is partitioned and
managed by DynamoDB.
Reserved capacity.

Up to 53% for 1 year reservation.
Up to 76% for 3 year reservation.
Authentication.
Session based to minimize latency.
Uses the Amazon Security Token Service.
Handled by AWS SDKs.
Integrates with IAM.
Monitoring.
CloudWatch metrics:
latency, consumed read and write throughput,
errors and throttling.
Indexing.
Items are indexed by primary and secondary keys.

Primary keys can be composite.
Secondary keys index on other attributes.
ID

Date

Total

id = 100

date = 2012-05-16-09-00-10

total = 25.00

id = 101

date = 2012-05-15-15-00-11

total = 35.00

id = 101

date = 2012-05-16-12-00-10

total = 100.00

id = 102

date = 2012-03-20-18-23-10

total = 20.00

id = 102

date = 2012-03-20-18-23-10

total = 120.00
Hash key

ID

Date

Total

id = 100

date = 2012-05-16-09-00-10

total = 25.00

id = 101

date = 2012-05-15-15-00-11

total = 35.00

id = 101

date = 2012-05-16-12-00-10

total = 100.00

id = 102

date = 2012-03-20-18-23-10

total = 20.00

id = 102

date = 2012-03-20-18-23-10

total = 120.00
Hash key

Range key

ID

Date

Total

Composite primary key

id = 100

date = 2012-05-16-09-00-10

total = 25.00

id = 101

date = 2012-05-15-15-00-11

total = 35.00

id = 101

date = 2012-05-16-12-00-10

total = 100.00

id = 102

date = 2012-03-20-18-23-10

total = 20.00

id = 102

date = 2012-03-20-18-23-10

total = 120.00
Hash key

Range key

Secondary range key

ID

Date

Total

id = 100

date = 2012-05-16-09-00-10

total = 25.00

id = 101

date = 2012-05-15-15-00-11

total = 35.00

id = 101

date = 2012-05-16-12-00-10

total = 100.00

id = 102

date = 2012-03-20-18-23-10

total = 20.00

id = 102

date = 2012-03-20-18-23-10

total = 120.00
Conditional updates.
PutItem, UpdateItem, DeleteItem can take
optional conditions for operation.
UpdateItem performs atomic increments.
One API call, multiple items
BatchGet returns multiple items by key.

BatchWrite performs up to 25 put or delete operations.
Throughput is measured by IO, not API calls.
Query vs Scan
Query for Composite Key queries.
Scan for full table scans, exports.

Both support pages and limits.
Maximum response is 1Mb in size.
Unlimited storage.
Unlimited attributes per item.
Unlimited items per table.
Maximum of 64k per item.
Split across items.

message_id = 1

part = 1

message =
<first 64k>

message_id = 1

part = 2

message =
<second 64k>

message_id = 1

part = 3

joined =
<third 64k>
Store a pointer to S3.

message_id = 1

message =
http://s3.amazonaws.com...

message_id = 2

message =
http://s3.amazonaws.com...

message_id = 3

message =
http://s3.amazonaws.com...
Time Series Data
Separate Data by Throughput Required
Hot Data in Tables with High Provisioned IO
Older Data in Tables with Low IO
December

January

February

March

April
April - 1000 Read IOPS, 1000 Write IOPS

Hot and cold tables.

event_id =
1000

timestamp =
2013-04-16-09-59-01

key =
value

event_id =
1001

timestamp =
2013-04-16-09-59-02

key =
value

event_id =
1002

timestamp =
2013-04-16-09-59-02

key =
value

March - 200 Read IOPS, 1 Write IOPS
event_id =
1000

timestamp =
2013-03-01-09-59-01

key =
value

event_id =
1001

timestamp =
2013-03-01-09-59-02

key =
value

event_id =
1002

timestamp =
2013-03-01-09-59-02

key =
value
Archive data.
Move old data to S3: lower cost.
Still available for analytics.
Run queries across hot and cold data
with Elastic MapReduce.
Partitioning
Uniform workload.
Data stored across multiple partitions.
Data is primarily distributed by primary key.

Provisioned throughput is divided evenly across partitions.
To achieve and maintain full
provisioned throughput, spread
workload evenly across hash keys.
Non-Uniform workload.
Might be throttled, even at high levels of throughput.
BEST PRACTICE 1:

Distinct values for hash keys.
Hash key elements should have a
high number of distinct values.
Lots of users with unique user_id.
Workload well distributed across hash key.
user_id =
mza

first_name =
Matt

last_name =
Wood

user_id =
jeffbarr

first_name =
Jeff

last_name =
Barr

user_id =
werner

first_name =
Werner

last_name =
Vogels

user_id =
simone

first_name =
Simone

last_name =
Brunozzi

...

...

...
BEST PRACTICE 2:

Avoid limited hash key values.
Hash key elements should have a
high number of distinct values.
Small number of status codes.
Unevenly, non-uniform workload.
status =
200

date =
2012-04-01-00-00-01

status =
404

date =
2012-04-01-00-00-01

status
404

date =
2012-04-01-00-00-01

status =
404

date =
2012-04-01-00-00-01
BEST PRACTICE 3:

Model for even distribution.
Access by hash key value should be evenly
distributed across the dataset.
Large number of devices.
Small number which are much more popular than others.
Workload unevenly distributed.
mobile_id =
100

access_date =
2012-04-01-00-00-01

mobile_id =
100

access_date =
2012-04-01-00-00-02

mobile_id =
100

access_date =
2012-04-01-00-00-03

mobile_id =
100

access_date =
2012-04-01-00-00-04

...

...
Sample access pattern.
Workload randomized by hash key.
mobile_id =
100.1

access_date =
2012-04-01-00-00-01

mobile_id =
100.2

access_date =
2012-04-01-00-00-02

mobile_id =
100.3

access_date =
2012-04-01-00-00-03

mobile_id =
100.4

access_date =
2012-04-01-00-00-04

...

...
BEST PRACTICE 4:

Distribute scans across dataset
Improve retrieval times by scanning partitions
concurrently using the Parallel Scan feature.
Parallel Scan: separate thread for each table segment

Worker
Thread 0

Application
Main Thread

Worker
Thread 1

Worker
Thread 2
Reporting & Analytics
Seamless scale.

Scalable methods for data processing.
Scalable methods for backup/restore.
Amazon Elastic MapReduce.
Managed Hadoop service for
data-intensive workflows.

aws.amazon.com/emr
create external table items_db
(id string, votes bigint, views bigint) stored by
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
tblproperties
("dynamodb.table.name" = "items",
"dynamodb.column.mapping" =
"id:id,votes:votes,views:views");
select id, likes, views
from items_db
order by views desc;

Contenu connexe

Tendances

Tendances (6)

Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
 
Amazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database EngineAmazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database Engine
 
Amazon Glacier vs Amazon S3
Amazon Glacier vs Amazon S3Amazon Glacier vs Amazon S3
Amazon Glacier vs Amazon S3
 
Amazon S3: Updates and Best Practices - SRV301 - Chicago AWS Summit
Amazon S3: Updates and Best Practices - SRV301 - Chicago AWS SummitAmazon S3: Updates and Best Practices - SRV301 - Chicago AWS Summit
Amazon S3: Updates and Best Practices - SRV301 - Chicago AWS Summit
 

En vedette

AWS Partner Presentation - Suse Linux Proven Cloud Success
AWS Partner Presentation - Suse Linux Proven Cloud SuccessAWS Partner Presentation - Suse Linux Proven Cloud Success
AWS Partner Presentation - Suse Linux Proven Cloud Success
Amazon Web Services
 
AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...
AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...
AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...
Amazon Web Services
 

En vedette (20)

Dynamo DB & RDS Deep Dive - AWS India Summit 2012
Dynamo DB & RDS Deep Dive - AWS India Summit 2012Dynamo DB & RDS Deep Dive - AWS India Summit 2012
Dynamo DB & RDS Deep Dive - AWS India Summit 2012
 
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB DayGetting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
 
Empowering Publishers - Unlocking the power of Amazon Web Services - May-15-2...
Empowering Publishers - Unlocking the power of Amazon Web Services - May-15-2...Empowering Publishers - Unlocking the power of Amazon Web Services - May-15-2...
Empowering Publishers - Unlocking the power of Amazon Web Services - May-15-2...
 
(DEV303) Practical DynamoDB Programming in Java
(DEV303) Practical DynamoDB Programming in Java(DEV303) Practical DynamoDB Programming in Java
(DEV303) Practical DynamoDB Programming in Java
 
Masterclass Live: Amazon EC2
Masterclass Live: Amazon EC2 Masterclass Live: Amazon EC2
Masterclass Live: Amazon EC2
 
Understanding AWS Security
Understanding AWS SecurityUnderstanding AWS Security
Understanding AWS Security
 
AWS Partner Presentation - Sonian
AWS Partner Presentation - SonianAWS Partner Presentation - Sonian
AWS Partner Presentation - Sonian
 
AWS Webcast - Library Storage Webinar
AWS Webcast - Library Storage WebinarAWS Webcast - Library Storage Webinar
AWS Webcast - Library Storage Webinar
 
AWS Partner Presentation - Suse Linux Proven Cloud Success
AWS Partner Presentation - Suse Linux Proven Cloud SuccessAWS Partner Presentation - Suse Linux Proven Cloud Success
AWS Partner Presentation - Suse Linux Proven Cloud Success
 
Leveraging Hybid IT for More Robust Business Services
Leveraging Hybid IT for More Robust Business ServicesLeveraging Hybid IT for More Robust Business Services
Leveraging Hybid IT for More Robust Business Services
 
AWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go SquaredAWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go Squared
 
AWS Webcast - Backup and Archiving in the AWS Cloud
AWS Webcast - Backup and Archiving in the AWS CloudAWS Webcast - Backup and Archiving in the AWS Cloud
AWS Webcast - Backup and Archiving in the AWS Cloud
 
AWS Sydney Summit 2013 - Building Web Scale Applications with AWS
AWS Sydney Summit 2013 - Building Web Scale Applications with AWSAWS Sydney Summit 2013 - Building Web Scale Applications with AWS
AWS Sydney Summit 2013 - Building Web Scale Applications with AWS
 
AWS Sydney Summit 2013 - Technical Lessons on How to do DR in the Cloud
AWS Sydney Summit 2013 - Technical Lessons on How to do DR in the CloudAWS Sydney Summit 2013 - Technical Lessons on How to do DR in the Cloud
AWS Sydney Summit 2013 - Technical Lessons on How to do DR in the Cloud
 
Workshop part3 – IOT
Workshop part3 – IOTWorkshop part3 – IOT
Workshop part3 – IOT
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...
AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...
AWS Cloud Kata 2013 | Singapore - Opening Keynote: Running Lean & Scaling Fas...
 
AWS Summit 2013 | India - Running High Churn Development & Test Environments,...
AWS Summit 2013 | India - Running High Churn Development & Test Environments,...AWS Summit 2013 | India - Running High Churn Development & Test Environments,...
AWS Summit 2013 | India - Running High Churn Development & Test Environments,...
 
Andy Jassy Keynote Sydney Customer Appreciation Day
Andy Jassy Keynote Sydney Customer Appreciation DayAndy Jassy Keynote Sydney Customer Appreciation Day
Andy Jassy Keynote Sydney Customer Appreciation Day
 
Unlocking the Value of your Data Featuring AWS Enterprise Use Cases
Unlocking the Value of your Data Featuring AWS Enterprise Use CasesUnlocking the Value of your Data Featuring AWS Enterprise Use Cases
Unlocking the Value of your Data Featuring AWS Enterprise Use Cases
 

Similaire à AWS Under the covers with Amazon DynamoDB IP Expo 2013

AWS Summit 2013 | Auckland - Big Data Analytics
AWS Summit 2013 | Auckland - Big Data AnalyticsAWS Summit 2013 | Auckland - Big Data Analytics
AWS Summit 2013 | Auckland - Big Data Analytics
Amazon Web Services
 

Similaire à AWS Under the covers with Amazon DynamoDB IP Expo 2013 (20)

Building Applications with DynamoDB
Building Applications with DynamoDBBuilding Applications with DynamoDB
Building Applications with DynamoDB
 
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDBData & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
 
Masterclass Webinar: Amazon DynamoDB July 2014
Masterclass Webinar: Amazon DynamoDB July 2014Masterclass Webinar: Amazon DynamoDB July 2014
Masterclass Webinar: Amazon DynamoDB July 2014
 
Conhecendo o DynamoDB
Conhecendo o DynamoDBConhecendo o DynamoDB
Conhecendo o DynamoDB
 
DynamoDB Deep Dive
DynamoDB Deep DiveDynamoDB Deep Dive
DynamoDB Deep Dive
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Aws Summit Berlin 2013 - Understanding database options on AWS
Aws Summit Berlin 2013 - Understanding database options on AWSAws Summit Berlin 2013 - Understanding database options on AWS
Aws Summit Berlin 2013 - Understanding database options on AWS
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
OLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseOLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure Synapse
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
 
AWS Summit 2013 | Auckland - Big Data Analytics
AWS Summit 2013 | Auckland - Big Data AnalyticsAWS Summit 2013 | Auckland - Big Data Analytics
AWS Summit 2013 | Auckland - Big Data Analytics
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
AWS Enterprise Summit Netherlands - AWS IoT
AWS Enterprise Summit Netherlands - AWS IoTAWS Enterprise Summit Netherlands - AWS IoT
AWS Enterprise Summit Netherlands - AWS IoT
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
Optimizing SIEM Performance
Optimizing SIEM PerformanceOptimizing SIEM Performance
Optimizing SIEM Performance
 
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

AWS Under the covers with Amazon DynamoDB IP Expo 2013

  • 1. Under the covers with Dynamo DB Ian Meyers IP Expo 2013
  • 2. RDS Dynamo DB Redshift Deployment & Administration App Services Fully Managed, Provisioned throughput NoSQL Compute Storage Database Networking AWS Global Infrastructure database Fast, predictable, configurable performance Fully distributed, fault tolerant HA architecture Integration with EMR & Hive
  • 3. Consistent, predictable performance. Single digit millisecond latency. Backed by solid-state drives.
  • 4. Flexible data model. Key/attribute pairs. No schema required. Easy to create. Easy to adjust.
  • 5. Seamless scalability. No table size limits. Unlimited storage. No downtime.
  • 6. Durable. Consistent, disk only writes. Replication across data centers and availability zones.
  • 7. Without the operational burden. No Cluster to Manage No HA to Manage
  • 8. Consistent writes. Atomic increment and decrement. Optimistic concurrency control: conditional writes.
  • 9. Transactions. Native item level transactions only. Puts, updates and deletes are ACID. Transaction API for Java.
  • 10. Three decisions + three clicks = ready for use
  • 11. Level of throughput Primary keys Secondary Indexes Three decisions + three clicks = ready for use
  • 12. Level of throughput Primary keys Secondary Indexes Three decisions + three clicks = ready for use
  • 13. Provisioned throughput. Reserve IOPS for reads and writes. Scale up for down at any time.
  • 14. Pay per capacity unit. Priced per hour of provisioned throughput.
  • 15. Write throughput. Size of item x writes per second $0.0065 for 10 write units
  • 16. Strong or eventual consistency Read throughput. Read data will reflect all previous transactions. $0.0065 per hour for 50 units.
  • 17. Strong or eventual consistency Read throughput. Read data may reflect old values. $0.0065 per hour for 100 units.
  • 18. Strong or eventual consistency Read throughput. Same latency expectations. Mix and match at ‘read time’.
  • 20. Data is partitioned and managed by DynamoDB.
  • 21. Reserved capacity. Up to 53% for 1 year reservation. Up to 76% for 3 year reservation.
  • 22. Authentication. Session based to minimize latency. Uses the Amazon Security Token Service. Handled by AWS SDKs. Integrates with IAM.
  • 23. Monitoring. CloudWatch metrics: latency, consumed read and write throughput, errors and throttling.
  • 24. Indexing. Items are indexed by primary and secondary keys. Primary keys can be composite. Secondary keys index on other attributes.
  • 25. ID Date Total id = 100 date = 2012-05-16-09-00-10 total = 25.00 id = 101 date = 2012-05-15-15-00-11 total = 35.00 id = 101 date = 2012-05-16-12-00-10 total = 100.00 id = 102 date = 2012-03-20-18-23-10 total = 20.00 id = 102 date = 2012-03-20-18-23-10 total = 120.00
  • 26. Hash key ID Date Total id = 100 date = 2012-05-16-09-00-10 total = 25.00 id = 101 date = 2012-05-15-15-00-11 total = 35.00 id = 101 date = 2012-05-16-12-00-10 total = 100.00 id = 102 date = 2012-03-20-18-23-10 total = 20.00 id = 102 date = 2012-03-20-18-23-10 total = 120.00
  • 27. Hash key Range key ID Date Total Composite primary key id = 100 date = 2012-05-16-09-00-10 total = 25.00 id = 101 date = 2012-05-15-15-00-11 total = 35.00 id = 101 date = 2012-05-16-12-00-10 total = 100.00 id = 102 date = 2012-03-20-18-23-10 total = 20.00 id = 102 date = 2012-03-20-18-23-10 total = 120.00
  • 28. Hash key Range key Secondary range key ID Date Total id = 100 date = 2012-05-16-09-00-10 total = 25.00 id = 101 date = 2012-05-15-15-00-11 total = 35.00 id = 101 date = 2012-05-16-12-00-10 total = 100.00 id = 102 date = 2012-03-20-18-23-10 total = 20.00 id = 102 date = 2012-03-20-18-23-10 total = 120.00
  • 29. Conditional updates. PutItem, UpdateItem, DeleteItem can take optional conditions for operation. UpdateItem performs atomic increments.
  • 30. One API call, multiple items BatchGet returns multiple items by key. BatchWrite performs up to 25 put or delete operations. Throughput is measured by IO, not API calls.
  • 31. Query vs Scan Query for Composite Key queries. Scan for full table scans, exports. Both support pages and limits. Maximum response is 1Mb in size.
  • 32. Unlimited storage. Unlimited attributes per item. Unlimited items per table. Maximum of 64k per item.
  • 33. Split across items. message_id = 1 part = 1 message = <first 64k> message_id = 1 part = 2 message = <second 64k> message_id = 1 part = 3 joined = <third 64k>
  • 34. Store a pointer to S3. message_id = 1 message = http://s3.amazonaws.com... message_id = 2 message = http://s3.amazonaws.com... message_id = 3 message = http://s3.amazonaws.com...
  • 35. Time Series Data Separate Data by Throughput Required Hot Data in Tables with High Provisioned IO Older Data in Tables with Low IO
  • 37. April - 1000 Read IOPS, 1000 Write IOPS Hot and cold tables. event_id = 1000 timestamp = 2013-04-16-09-59-01 key = value event_id = 1001 timestamp = 2013-04-16-09-59-02 key = value event_id = 1002 timestamp = 2013-04-16-09-59-02 key = value March - 200 Read IOPS, 1 Write IOPS event_id = 1000 timestamp = 2013-03-01-09-59-01 key = value event_id = 1001 timestamp = 2013-03-01-09-59-02 key = value event_id = 1002 timestamp = 2013-03-01-09-59-02 key = value
  • 38. Archive data. Move old data to S3: lower cost. Still available for analytics. Run queries across hot and cold data with Elastic MapReduce.
  • 40. Uniform workload. Data stored across multiple partitions. Data is primarily distributed by primary key. Provisioned throughput is divided evenly across partitions.
  • 41. To achieve and maintain full provisioned throughput, spread workload evenly across hash keys.
  • 42. Non-Uniform workload. Might be throttled, even at high levels of throughput.
  • 43. BEST PRACTICE 1: Distinct values for hash keys. Hash key elements should have a high number of distinct values.
  • 44. Lots of users with unique user_id. Workload well distributed across hash key. user_id = mza first_name = Matt last_name = Wood user_id = jeffbarr first_name = Jeff last_name = Barr user_id = werner first_name = Werner last_name = Vogels user_id = simone first_name = Simone last_name = Brunozzi ... ... ...
  • 45. BEST PRACTICE 2: Avoid limited hash key values. Hash key elements should have a high number of distinct values.
  • 46. Small number of status codes. Unevenly, non-uniform workload. status = 200 date = 2012-04-01-00-00-01 status = 404 date = 2012-04-01-00-00-01 status 404 date = 2012-04-01-00-00-01 status = 404 date = 2012-04-01-00-00-01
  • 47. BEST PRACTICE 3: Model for even distribution. Access by hash key value should be evenly distributed across the dataset.
  • 48. Large number of devices. Small number which are much more popular than others. Workload unevenly distributed. mobile_id = 100 access_date = 2012-04-01-00-00-01 mobile_id = 100 access_date = 2012-04-01-00-00-02 mobile_id = 100 access_date = 2012-04-01-00-00-03 mobile_id = 100 access_date = 2012-04-01-00-00-04 ... ...
  • 49. Sample access pattern. Workload randomized by hash key. mobile_id = 100.1 access_date = 2012-04-01-00-00-01 mobile_id = 100.2 access_date = 2012-04-01-00-00-02 mobile_id = 100.3 access_date = 2012-04-01-00-00-03 mobile_id = 100.4 access_date = 2012-04-01-00-00-04 ... ...
  • 50. BEST PRACTICE 4: Distribute scans across dataset Improve retrieval times by scanning partitions concurrently using the Parallel Scan feature.
  • 51. Parallel Scan: separate thread for each table segment Worker Thread 0 Application Main Thread Worker Thread 1 Worker Thread 2
  • 53. Seamless scale. Scalable methods for data processing. Scalable methods for backup/restore.
  • 54. Amazon Elastic MapReduce. Managed Hadoop service for data-intensive workflows. aws.amazon.com/emr
  • 55. create external table items_db (id string, votes bigint, views bigint) stored by 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' tblproperties ("dynamodb.table.name" = "items", "dynamodb.column.mapping" = "id:id,votes:votes,views:views");
  • 56. select id, likes, views from items_db order by views desc;