La big datacamp-2014-aws-dynamodb-overview-michael_limcaco

MichaelLimcaco
Solutions Architect
Amazon Web Services
NoSQL in the Cloud: Amazon DynamoDB
Fast and durable at any scale

Databases in the Cloud
first a little context

Traditional Database Architecture
App/Web Tier
Client Tier
RDBMS
one database for
all workloads

• key-value access
• complex queries
• transactions
• analytics
Traditional Database Architecture
App/Web Tier
Client Tier
RDBMS

Data Tier
Cache Data Warehouse Blob Store
RDBMSNoSQL Search
Cloud Data Tier Architecture
App/Web Tier
Client Tier
best database for
each workload

Workload Driven Data Store Selection
Data Tier
Cache Data Warehouse Blob Store
RDBMSNoSQL Search
logging
rich search
key/value
simple query
hot reads
analytics
complexqueries
& transactions

AWS Services for the Data Tier
Data Tier
Amazon
DynamoDB
Amazon
RDS
Amazon
ElastiCache
Amazon
S3
Amazon
CloudSearch
Amazon
Redshift
logging
rich search
key/value
simple query
hot reads
analytics
complexqueries
& transactions

DynamoDB is a managed
NoSQL database service.
Store and retrieve any amount of data
Serve any level of request traffic

Consistent, predictable
performance.
Single digit millisecond latency.
Backed on solid-state drives.

Flexible data model.
Key attribute pairs. No schema required.

Rich Tooling
SDK/Libraries
JSON-Based Web API
IDE Plugins
CLI

Without the operational
burden.

RDBMS = Default Choice
• Amazon.com page composed of responses from 1000’s of
independent services
• Query patterns for different service are different
 Catalog service is usually heavy key-value
 Ordering service is very write intensive (key-value)
 Catalog search has a different pattern for querying
Relational Era @ Amazon.com
RDBMS
PoorAvailability Limited Scalability High Cost

Dynamo = NoSQL Technology
• Replicated DHT
• Consistent hashing
• Optimistic replication
• Quorum strategies
• Anti-entropy mechanisms
• Object versioning
Distributed Era @ Amazon.com
lack of strong every engineer needsto operational
consistency learndistributedsystems complexity

DynamoDB = NoSQL Cloud Service
Cloud Era @ Amazon.com
Seamless Scalability
Fast & Predictable Performance
Easy Administration
Streamlined Development
Cost Effective

partitions
1 .. N
table
• DynamoDB automatically
partitions data by the hash key
 Hash key spreads data (& workload)
across partitions
• Auto-partitioning occurs with
 Data set size growth
 Provisioned capacity increases
Massive and Seamless Scale

WRITES
Continuously replicated to 3 Facilities
Quorum acknowledgment
Persisted to disk (SSD)
READS
Strongly or eventually consistent
No trade-off in latency
Durable At Scale

Provisioned Throughput
• Request-based capacity provisioning model
• Throughput is declared and updated via the API or the
console
 CreateTable (foo, reads/sec = 100, writes/sec = 150)
 UpdateTable (foo, reads/sec=10000, writes/sec=4500)
• DynamoDB handles the rest
 Capacity is reserved and available when needed
 Scaling-up triggers repartitioning and reallocation
 No impact to performance or availability
Predictable Performance

WRITES
Continuously replicated to 3 Facilities
Quorum acknowledgment
Persisted to disk (SSD)
READS
Strongly or eventually consistent
No trade-off in latency
Low Latency At Scale

Making life easier for developers…
• Developers are freed from:
 Performance tuning (latency)
 Automatic 3-way multi-facility replication
 Scalability (and scaling operations)
 Security inspections, patches, upgrades
 Software upgrades, patches
 Automatic hardware failover
 Improving the underlying hardware
…and more!
Automated Operations

DynamoDB Concepts
attributes
items
table
schema-less
schema is defined per attribute

DynamoDB Concepts
attributes
items
table
scalar data types
• number, string, and binary
multi-valued types
• string set, number set, and binary set

DynamoDB Concepts
hash
hash keys
mandatory for all items in a table
key-value access pattern
PutItem
UpdateItem
DeleteItem
BatchWriteItem
GetItem
BatchGetItem

Hash = Distribution Key
partition 1..N
hash keys
mandatory for all items in a table
key-value access pattern
determines data distribution

Hash = Distribution Key
large number of unique hash keys
uniform distribution of workload
across hash keys
optimal
schema
design
+

Range = Query
range
hash
range keys
model 1:N relationships
enable rich query capabilities
composite primary key
all items for a hash key
==, <, >, >=, <=
“begins with”
“between”
sorted results
counts
top / bottom N values
paged responses

Index Options
local secondary indexes (LSI)
alternate range key + same hash key
index and table data is co-located (same partition)

Projected Attributes
KEYS_ONLY
INCLUDE
ALL

Index Options
global secondary
indexes (GSI)
any attribute indexed as
new hash or range key
KEYS_ONLY
INCLUDE
ALL

Example Patterns
access pattern use case highlighted
modeling walk-thru features

• Method
1. Describe the overall use case – maintain context
2. Identify the individual access patterns of the use case
3. Model each access pattern to its own discrete data set
4. Consolidate data sets into tables and indexes
• Benefits
 Single table fetch for each query
 Payloads are minimal for each access
Access Pattern Modeling

Multi-tenant application for file storing and sharing
• User_ID is the unique identifier of each user
• File_ID is the unique identifier of each file, owner by user
GoodPK selection:User_ID(hash) + File_ID(range)
use case access patterns data design
Design Use Case: Media Catalog

1. Users should be able to query all the files they own
2. Search by File Name
3. Search by File Type
4. Search by Date Range
5. Keep track of Shared Files

1. Users should be able to query all the files they own
2. Search by File Name
3. Search by File Type
4. Search by Date Range
5. Keep track of Shared Files
additional (non-PK) attributes
& index candidates

Users
Hash key = User_ID
Attributes= User_Name
Email
Address
User_Files
Hash key = User_ID
Range key = File_ID
Attributes= Name
Size (N)
Date
SharedFlag
Link
DynamoDB Data Model: Main Tables
User has file[]

+ Secondary Indexes
Table Name Index Name Attribute to Index Projected Attribute
User_Files NameIndex Name KEYS
User_Files TypeIndex Type KEYS + Name
User_Files DateIndex Date KEYS + Name
User_Files SharedFlagIndex SharedFlag KEYS + Name
User_Files SizeIndex Size KEYS + Name
example only – required data returned
determines optimal projections

• Find all files owned by a user
 Query User_Files table (User_ID = “2”)
Access Pattern 1
User_ID
(Hash)
File_ID
(Range)
Name Date Type SharedFlag Size Link
1 1 File1 2013-04-23 JPG 10000 bucket1
1 2 File2 2013-03-10 MP4 Y 1000000 bucket2
2 3 File3 2013-03-10 MP4 Y 2000000 bucket3
2 4 File4 2013-03-10 AVI 3000000 bucket4
3 5 File5 2013-04-10 MP4 40000 bucket5

• Find all files owned by a user
 Query User_Files table (User_ID = “2”)
Access Pattern 1
User_ID
(Hash)
File_ID
(Range)
Name Date Type SharedFlag Size Link
1 1 File1 2013-04-23 JPG 10000 bucket1
1 2 File2 2013-03-10 MP4 Y 1000000 bucket2
2 3 File3 2013-03-10 MP4 Y 2000000 bucket3
2 4 File4 2013-03-10 AVI 3000000 bucket4
3 5 File5 2013-04-10 MP3 40000 bucket5

• Search by File Name
 Query
• IndexName = “NameIndex”
• User_ID = “1”
• Name = “File1”
Access Pattern 2
User_ID
(hash)
Name
(range)
File_ID
1 File1 1
1 File2 2
2 File3 3
2 File4 4
3 File5 5
NameIndex

• Search for file name by
file Type
 Query
• IndexName = “TypeIndex”
• Type = “MP4”
Access Pattern 3
UserId
(hash)
Type
(range)
File_ID Name
1 JPG 1 File1
1 MP4 2 File2
2 MP4 4 File4
2 AVI 3 File3
3 MP3 5 File5
projection
TypeIndex

• Search for file name by Date
range
 Query
• IndexName = “DateIndex”
• Date between “2013-03-01”
and “2013-03-29”
Access Pattern 4
User_ID
(hash)
Date
(range)
FileId Name
1 2013-03-10 2 File2
1 2013-04-23 1 File1
2 2013-03-10 3 File3
2 2013-03-10 4 File4
3 2013-04-10 5 File5
DateIndex
projection

• Search for names of
Shared files
 Query
• IndexName =
“SharedFlagIndex”
• SharedFlag = “Y”
Access Pattern 5
User_ID
(hash)
SharedFlag
(range)
FileId Name
1 Y 2 File2
2 Y 3 File3
SharedFlagIndex
projection

• Schema-less
 Only key information needed
 Individual items can define their own set of attributes
• Consistent Reads
 Inventory, shopping cart applications
• Atomic Counters
 Increment and return new value in same operation
• Conditional Writes
 Expected value before write – fails on mismatch
 “state machine” use cases
Highlighted Features

Hadoop Integration
+ Amazon Elastic Map Reduce (EMR)
Managed Hadoop service for
data-intensive workflows.

Define External Table (Hive)
create external table items_db
(id string, votes bigint, views bigint) stored by
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
tblproperties
("dynamodb.table.name" = "items",
"dynamodb.column.mapping" =
"id:id,votes:votes,views:views");

Query It
select id, likes, views
from items_db
order by views desc;

What Else?
autoscaling local testing cross-region
library and development export / import

• Third party library for automating scaling decisions
• Scale up for service levels, scale down for cost
• CloudFormation template for fast deployment
Autoscaling with Dynamic DynamoDB

• Cross-Region Export and Import
• DynamoDB Local
 Disconnected development with full API support
• No network
• No usage costs
• No SLA
• Geospatial and Transaction Libraries
• Fine-Grained Access Control
 Direct-to-DynamoDB access for mobile devices
Other Key Features
Get started today!
aws.amazon.com/dynamodb/developer-resources/

Managed NoSQL
seamless scalability predictable performance
always durable
automated operations
fast development cost effective
=

Thank You
aws.amazon.com/dynamodb

La big datacamp-2014-aws-dynamodb-overview-michael_limcaco

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (7)

Similaire à La big datacamp-2014-aws-dynamodb-overview-michael_limcaco

Similaire à La big datacamp-2014-aws-dynamodb-overview-michael_limcaco (20)

Plus de Data Con LA

Plus de Data Con LA (20)

Dernier

Dernier (20)

La big datacamp-2014-aws-dynamodb-overview-michael_limcaco