Value Proposition canvas- Customer needs and pains
Cloud Computing With Amazon Web Services, Part 5 Dataset Processing In The Cloud With Simple Db
1. Cloud computing with Amazon Web Services, Part 5: Dataset
processing in the cloud with SimpleDB
Amazon SimpleDB
Amazon SDB is a fast, scalable real-time dataset indexing and querying framework that
makes it easy to store and retrieve structured data for your Amazon Web Services-based
applications. It's designed to work well with the other Amazon Web Services, such as
Elastic Compute Cloud (EC2) and Simple Storage Service (S3). SDB enables you to
build your entire application stack within the Amazon Web Services environment. You
pay for the service based entirely upon your usage. There is also a free tier of service
available.
Some valuable features provided by SDB:
Reliability
SDB is designed to store your indexed data redundantly across multiple data
centers and to make them available at all times.
Speed
SDB is designed to provide quick retrieval of data, especially if your requests are
made from within the Amazon Web Services environment from an EC2 instance.
Simplicity
The programming model for accessing and using SDB is simple and can be used
from a variety of programming languages.
Security
SDB is designed to provide a high level of security. Access to the data is
restricted to authorized users.
Flexibility
SDB gives you the ability to store data on the fly without any need for pre-defined
schemas.
Inexpensive
SDB charges are quite economical. You are only charged for what you actually
use.
This rest of this section explores the concepts that underpin SDB.
Domains
A domain is a container that lets you store your structured data and run queries against it.
The data is stored in a domain as items. Conceptually, a domain is similar to a worksheet
tab in a spreadsheet; items are rows in the spreadsheet. You can run queries against a
domain, but you cannot yet query across domains in the current version of SDB.
Each domain has the following metadata associated with it:
• Date and time the metadata was last updated
2. • Number of all items in the domain
• Number of all attribute name-value pairs in the domain
• Number of unique attribute names in the domain
• Total size of all item names in the domain, in bytes
• Total size of all attribute values, in bytes
• Total size of all unique attribute names, in bytes
SDB, like Simple Queue Service (SQS), follows the "eventual consistency" model. SDB
maintains multiple copies of each domain for fault tolerance. Every change made to a
domain is propagated across all copies.
Amazon CTO Werner Vogels discusses the reasoning behind the concept of eventual
consistency on his blog.
Because this operation sometimes takes a few seconds, depending on system load and
network latency, a consumer of your domain may not see the changes immediately.
Changes will eventually be propagated throughout SDB, but this delay is an important
consideration when designing your SDB-based applications.
Items
Items represent individual objects within your domains, and they contain attributes with
values. Each item is conceptually similar to a row in a spreadsheet — an attribute is a
column and the values are cells. Attributes are not restricted to single values and can even
have multiple values. SDB automatically indexes your domains regardless of how the
data is structured.
SDB also has a time limit for executing any single query against your domains. If a query
takes longer than 5 seconds, SDB will stop the query and return an error.
Domains in SDB are flexible and don't have any fixed schemas. Each item within a
domain can contain a unique set of up to 256 attributes. The attributes can even be
completely different from all other attributes for the other items within that domain.
Limitations
The current version of SDB has limitations that you should consider when designing your
application. Table 1 shows the limitations (as specified by Amazon in its latest
documentation).
Table 1. Current limitations
Parameter Current restrictions
Domain size 10 GB per domain
250,000,000 attribute name-value pairs
3-255 characters (a-z, A-Z, 0-9, '_', '-', and '.')
3. Domains per Amazon Web 100
Services account
Attributes Name-value pairs per item is 256.
Name length is 1024 bytes.
Value length is 1024 bytes.
Only allowed characters are UTF-8 characters that are
valid in XML documents. Control characters and any
sequences that are not valid in XML are not allowed.
Per PutAttributes operation limited to 100
Requested per Select or QueryWithAttributes
operation limited to 256.
Maximum items in query 256
response
Maximum query execution time 5 seconds
Maximum predicates per query 10
expression
Maximum comparisons per 10
query expression predicate
Maximum number of unique 20
attributes per select expression
Maximum number of 20
comparisons per select
expression
Maximum response size for 1 MB
QueryWithAttributes and
Select
Pricing
Amazon provides a free tier for SDB, along with pricing for usage above the free tier
limit. The charges are based on:
• The machine usage of each SDB request.
• The amount of machine capacity used for completing the specified request,
normalized to the hourly capacity of a 1.7-GHz Xeon processor.
Free tier
There are no charges on the first 25 machine hours, 1 GB of data transfer, and 1 GB of
storage that you consume every month, at least until 1 Jun 2009. This is a significant
amount of usage being provided for free for a limited time by Amazon. Many types of
applications can operate very easily within this free tier. Table 2 shows example pricing.
4. Table 2. Pricing for machine utilization
Quantity Cost
First 25 machine hours Free
Additional machine hours $0.14 per machine hour
Table 3 addresses the amount of data transferred to and from SDB. There is no charge for
data transferred between SDB and other Amazon Web Services within the same region.
Data transferred between SDB and other Amazon Web Services across regions will be
charged at Internet Data Transfer rates on both sides of the transfer.
Table 3. Pricing for data transfer
Type of
Cost
transfer
All data First 1 GB of data transfer in is free
transfer $0.100 per GB — all additional data transfer
in
First 1 GB of data transfer out is free
$0.170 per GB — first 10 TB/month data
transfer out
$0.130 per GB — next 40 TB/month data
transfer out
$0.110 per GB — next 100 TB/month data
transfer out
$0.100 per GB — data transfer out / month
over 150 TB
Table 4 outlines costs for structured data storage.
Table 4. Structured data storage
Amount of
Cost
storage
All data storage First 1GB of data is free.
$0.25 per GB /month - all additional data
storage
For the latest pricing, check Amazon SDB. You can also use the Simple Monthly
Calculator provided by Amazon for calculating your monthly usage costs for SDB and
the other Amazon Web Services.
Getting started with SDB
5. To start exploring SDB, you need to sign up for an Amazon Web Services account (see
Resources). See Part 2 of this series for detailed instructions on signing up for Amazon
Web Services. Once you have an Amazon Web Services account, you must enable
Amazon SDB service for your account:
1. Log in to your Amazon Web Services account.
2. Navigate to the SDB home page.
3. Click Sign Up For This Web Service on the right side.
4. Provide the requested information and complete the sign-up process.
All communication with any of the Amazon Web Services is through either the SOAP
interface or the query interface. In this article, you use the query interface via a third-
party library to communicate with SDB.
You will need to obtain your access keys, which you can access from your Web Services
Account information page by selecting View Access Key Identifiers. You are now set up
to use Amazon Web Services and have enabled SDB service for your account.
Interacting with SDB
For this example, you use a third-party open source Python library named boto to become
familiar with SDB by running small snippets of code in a Python shell.
Install boto and set up your environment
Download boto. The latest version, as of the writing of this article, was 1.6b. Unzip the
archive to the directory of your choice. Change into this directory and run setup.py to
install boto into your local Python environment, as shown in Listing 1.
Listing 1. Install boto
$ cd directory_where_you_unzipped_boto
$ python setup.py install
Set up some environment variables to point to the Amazon Web Services access keys.
The access keys are available from the Web Services Account information.
Listing 2. Set up environment variables
# Export variables with your AWS access keys
$ export AWS_ACCESS_KEY_ID=Your_AWS_Access_Key_ID
$ export
AWS_SECRET_ACCESS_KEY=Your_AWS_Secret_Access_Key
6. Check to make sure everything is set up correctly by starting a Python shell and
importing the boto library, as shown in Listing 3.
Listing 3. Check the setup
$ python
Python 2.4.5 (#1, Apr 12 2008, 02:18:19)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on
darwin
Type "help", "copyright", "credits" or "license" for
more information.
>>> import boto
>>>
Explore SDB with boto
Use the SDBConnection class to provide the main interface for the interaction with SDB.
You will use boto from the Python console. The example calls different methods on the
SDBConnection object and examines the responses returned by SDB, which will help
you get familiar with the API while you explore the concepts behind SDB.
The first step is to create a connection object to SDB using the Amazon Web Services
access keys you exported earlier to your environment. The boto library always checks the
environment first to see if these variables are set. If they are set, boto automatically uses
them when it creates the connection.
Listing 4. Create a connection to SDB
>>> import boto
>>> sdb_conn = boto.connect_sdb()
>>>
For the rest of this article, you can use the sdb_conn object, created above, to interact
with SDB. You can create new domains by specifying a name for the domain.
Listing 5. Create a domain
>>> d1 = sdb_conn.create_domain('devworks-dom-1')
>>>
Retrieve a list of all your domains, which returns a result set object that is essentially a
Python list, as shown in Listing 6. You can iterate over this list and access each domain.
Listing 6. List all the domains
>>> all_domains = sdb_conn.get_all_domains()
7. >>>
>>> len(all_domains)
1
>>>
>>> for d in all_domains:
... print d.name
...
devworks-dom-1
You can also retrieve a single domain by name.
Listing 7. List single domain
>>> my_domain = sdb_conn.get_domain('devworks-dom-1')
>>>
>>> print my_domain.name
devworks-dom-1
Newly created domains are, of course, empty until you add items to them. You create a
new item within a domain, then add attributes to it.
Listing 8. Create new item
>>> my_domain = sdb_conn.get_domain('devworks-dom-1')
>>>
>>> i1 = my_domain.new_item('test_item_1')
>>>
>>> i1['cars'] = 'BMW'
>>>
>>> i1['fruits'] = ['apple', 'orange', 'mango']
>>>
Items can be retrieved from a domain by specifying the item name, which must be
unique. This is similar to the concept of a primary key in a relational database.
Listing 9. Retrieve an item and its attributes
>>> my_item = my_domain.get_item('test_item_1')
>>>
>>> print my_item
{u'cars': u'BMW', u'fruits': [u'apple', u'mango',
u'orange']}
>>>
The item object returned above is a live Item object that will automatically retrieve all
attributes for this item from SDB when you access any of its attributes. Any updates
made to the values of the attributes for this item will be saved automatically to SDB.
8. Listing 10. Update attributes
>>> my_item['cars']
u'BMW'
>>>
>>> my_item['cars'] = 'Honda'
>>>
>>> my_item['cars']
'Honda'
>>>
You can also retrieve items and attributes by using the SDBConnection class and
specifying the domain and item names.
Listing 11. Retrieve an item using SDBConnection
>>>
>>> sdb_conn.get_attributes('devworks-
dom-1','test_item_1')
{u'cars': u'Honda', u'fruits': [u'apple', u'mango',
u'orange']}
>>>
An item is automatically deleted by SDB if it does not have any attributes. You can also
specifically delete an item and its attributes.
Listing 12. Delete an item and its attributes
>>> sdb_conn.get_attributes('devworks-
dom-1','test_item_1')
{u'cars': u'Honda', u'fruits': [u'apple', u'mango',
u'orange']}
>>>
>>> sdb_conn.delete_attributes('devworks-
dom-1','test_item_1')
True
>>> sdb_conn.get_attributes('devworks-
dom-1','test_item_1')
{}
>>>
Listing 13. Delete a domain
>>> sdb_conn.delete_domain('devworks-dom-1')
True
>>>
Querying SDB domains
9. To search your structured data, SDB provides a custom query language that contains
attribute name-value pairs associated with items. The basic component when building up
a query expression is called a predicate. Each predicate is delineated by a square bracket
that surrounds an attribute, a comparison operator, and a value to compare. For example,
a predicate (such as ['desc' = 'Hello Devworks']) defines an equality comparison on
the attribute desc. Each predicate is evaluated separately and produces a set of item
names. You can combine multiple predicates using set operations like union and
intersection to build complex queries.
When using predicates in your queries, it's important to consider that all predicate
comparisons are performed lexicographically by SDB. You must ensure that your data is
stored in attributes using the appropriate string representation. Keep in mind that queries
taking longer than 5 seconds will be automatically aborted by SDB.
Listing 14. Create some test data
>>> d2 = sdb_conn.create_domain('devworks-dom-2')
>>>
>>> i1 = d2.new_item('car1')
>>>
>>> i1['make']= 'BMW'
>>> i1['color']='grey'
>>> i1['year']='2008'
>>> i1['desc']='Sedan'
>>> i1['model']='530i'
>>>
>>> i2 = d2.new_item('car2')
>>>
>>> i2['make']= 'BMW'
>>> i2['color']='white'
>>> i2['year']='2007'
>>> i2['desc']='Sports Utility Vehicle'
>>> i2['model']='X5'
>>>
Listing 15. Query with a single predicate
>>> rs = d2.query("['make' = 'BMW']")
>>> for result in rs:
... print result.name
...
car1
car2
>>>
Listing 16. Query with multiple predicates
>>> rs = d2.query("['make' = 'BMW'] intersection
['year' = '2007']")
>>> for result in rs:
... print result.name
...
10. car2
>>>
The query language provides support for a variety of comparison operators. It lets you
perform range queries and multi-valued attribute queries. To get a good grasp of all the
possibilities and best practices for creating queries and fine-tuning them for best
performance, it's highly recommended that you review the introductory articles on the
query language provided by Amazon Web Services.
You can also retrieve the metadata for a domain that gives you the total number of items
in the domain (in addition to other data).
Listing 17. Metadata for a domain
>>> my_domain = sdb_conn.get_domain('devworks-dom-2')
>>>
>>> my_metadata = my_domain.get_metadata()
>>>
>>> print my_metadata.item_count
2
>>> print my_metadata.item_names_size
8
>>> print my_metadata.attr_value_count
10
>>> print my_metadata.attr_names_size
22
>>> print my_metadata.attr_values_size
56
>>> print my_metadata.timestamp
1231798889
>>>
Conclusion
This article introduced you to Amazon's SDB service. You learned some of the basic
concepts and explored some of the functions provided by boto, an open source Python
library for interacting with SDB.