SlideShare une entreprise Scribd logo
1  sur  10
Cloud computing with Amazon Web Services, Part 5: Dataset
processing in the cloud with SimpleDB

Amazon SimpleDB

Amazon SDB is a fast, scalable real-time dataset indexing and querying framework that
makes it easy to store and retrieve structured data for your Amazon Web Services-based
applications. It's designed to work well with the other Amazon Web Services, such as
Elastic Compute Cloud (EC2) and Simple Storage Service (S3). SDB enables you to
build your entire application stack within the Amazon Web Services environment. You
pay for the service based entirely upon your usage. There is also a free tier of service
available.

Some valuable features provided by SDB:

Reliability
       SDB is designed to store your indexed data redundantly across multiple data
       centers and to make them available at all times.
Speed
       SDB is designed to provide quick retrieval of data, especially if your requests are
       made from within the Amazon Web Services environment from an EC2 instance.
Simplicity
       The programming model for accessing and using SDB is simple and can be used
       from a variety of programming languages.
Security
       SDB is designed to provide a high level of security. Access to the data is
       restricted to authorized users.
Flexibility
       SDB gives you the ability to store data on the fly without any need for pre-defined
       schemas.
Inexpensive
       SDB charges are quite economical. You are only charged for what you actually
       use.

This rest of this section explores the concepts that underpin SDB.

Domains

A domain is a container that lets you store your structured data and run queries against it.
The data is stored in a domain as items. Conceptually, a domain is similar to a worksheet
tab in a spreadsheet; items are rows in the spreadsheet. You can run queries against a
domain, but you cannot yet query across domains in the current version of SDB.

Each domain has the following metadata associated with it:

   •   Date and time the metadata was last updated
•    Number of all items in the domain
   •    Number of all attribute name-value pairs in the domain
   •    Number of unique attribute names in the domain
   •    Total size of all item names in the domain, in bytes
   •    Total size of all attribute values, in bytes
   •    Total size of all unique attribute names, in bytes

SDB, like Simple Queue Service (SQS), follows the "eventual consistency" model. SDB
maintains multiple copies of each domain for fault tolerance. Every change made to a
domain is propagated across all copies.

Amazon CTO Werner Vogels discusses the reasoning behind the concept of eventual
consistency on his blog.

Because this operation sometimes takes a few seconds, depending on system load and
network latency, a consumer of your domain may not see the changes immediately.
Changes will eventually be propagated throughout SDB, but this delay is an important
consideration when designing your SDB-based applications.

Items

Items represent individual objects within your domains, and they contain attributes with
values. Each item is conceptually similar to a row in a spreadsheet — an attribute is a
column and the values are cells. Attributes are not restricted to single values and can even
have multiple values. SDB automatically indexes your domains regardless of how the
data is structured.

SDB also has a time limit for executing any single query against your domains. If a query
takes longer than 5 seconds, SDB will stop the query and return an error.

Domains in SDB are flexible and don't have any fixed schemas. Each item within a
domain can contain a unique set of up to 256 attributes. The attributes can even be
completely different from all other attributes for the other items within that domain.

Limitations

The current version of SDB has limitations that you should consider when designing your
application. Table 1 shows the limitations (as specified by Amazon in its latest
documentation).


Table 1. Current limitations
          Parameter                                 Current restrictions
Domain size                       10 GB per domain
                                  250,000,000 attribute name-value pairs
                                  3-255 characters (a-z, A-Z, 0-9, '_', '-', and '.')
Domains per Amazon Web             100
Services account
Attributes                         Name-value pairs per item is 256.
                                   Name length is 1024 bytes.
                                   Value length is 1024 bytes.

                                   Only allowed characters are UTF-8 characters that are
                                   valid in XML documents. Control characters and any
                                   sequences that are not valid in XML are not allowed.

                                   Per PutAttributes operation limited to 100
                                   Requested per Select or QueryWithAttributes
                                   operation limited to 256.
Maximum items in query             256
response
Maximum query execution time       5 seconds
Maximum predicates per query       10
expression
Maximum comparisons per            10
query expression predicate
Maximum number of unique           20
attributes per select expression
Maximum number of                  20
comparisons per select
expression
Maximum response size for          1 MB
QueryWithAttributes and
Select

Pricing

Amazon provides a free tier for SDB, along with pricing for usage above the free tier
limit. The charges are based on:

    •     The machine usage of each SDB request.
    •     The amount of machine capacity used for completing the specified request,
          normalized to the hourly capacity of a 1.7-GHz Xeon processor.

Free tier

There are no charges on the first 25 machine hours, 1 GB of data transfer, and 1 GB of
storage that you consume every month, at least until 1 Jun 2009. This is a significant
amount of usage being provided for free for a limited time by Amazon. Many types of
applications can operate very easily within this free tier. Table 2 shows example pricing.
Table 2. Pricing for machine utilization
           Quantity                        Cost
First 25 machine hours         Free
Additional machine hours       $0.14 per machine hour

Table 3 addresses the amount of data transferred to and from SDB. There is no charge for
data transferred between SDB and other Amazon Web Services within the same region.
Data transferred between SDB and other Amazon Web Services across regions will be
charged at Internet Data Transfer rates on both sides of the transfer.


Table 3. Pricing for data transfer
    Type of
                                     Cost
    transfer
All data       First 1 GB of data transfer in is free
transfer       $0.100 per GB — all additional data transfer
               in

                First 1 GB of data transfer out is free
                $0.170 per GB — first 10 TB/month data
                transfer out
                $0.130 per GB — next 40 TB/month data
                transfer out
                $0.110 per GB — next 100 TB/month data
                transfer out
                $0.100 per GB — data transfer out / month
                over 150 TB

Table 4 outlines costs for structured data storage.


Table 4. Structured data storage
    Amount of
                                     Cost
     storage
All data storage First 1GB of data is free.
                 $0.25 per GB /month - all additional data
                 storage


For the latest pricing, check Amazon SDB. You can also use the Simple Monthly
Calculator provided by Amazon for calculating your monthly usage costs for SDB and
the other Amazon Web Services.

Getting started with SDB
To start exploring SDB, you need to sign up for an Amazon Web Services account (see
Resources). See Part 2 of this series for detailed instructions on signing up for Amazon
Web Services. Once you have an Amazon Web Services account, you must enable
Amazon SDB service for your account:

   1.   Log in to your Amazon Web Services account.
   2.   Navigate to the SDB home page.
   3.   Click Sign Up For This Web Service on the right side.
   4.   Provide the requested information and complete the sign-up process.

All communication with any of the Amazon Web Services is through either the SOAP
interface or the query interface. In this article, you use the query interface via a third-
party library to communicate with SDB.

You will need to obtain your access keys, which you can access from your Web Services
Account information page by selecting View Access Key Identifiers. You are now set up
to use Amazon Web Services and have enabled SDB service for your account.

Interacting with SDB

For this example, you use a third-party open source Python library named boto to become
familiar with SDB by running small snippets of code in a Python shell.

Install boto and set up your environment

Download boto. The latest version, as of the writing of this article, was 1.6b. Unzip the
archive to the directory of your choice. Change into this directory and run setup.py to
install boto into your local Python environment, as shown in Listing 1.


Listing 1. Install boto
$ cd directory_where_you_unzipped_boto

$ python setup.py install



Set up some environment variables to point to the Amazon Web Services access keys.
The access keys are available from the Web Services Account information.


Listing 2. Set up environment variables
# Export variables with your AWS access keys
$ export AWS_ACCESS_KEY_ID=Your_AWS_Access_Key_ID
$ export
AWS_SECRET_ACCESS_KEY=Your_AWS_Secret_Access_Key
Check to make sure everything is set up correctly by starting a Python shell and
importing the boto library, as shown in Listing 3.


Listing 3. Check the setup
$ python
Python 2.4.5 (#1, Apr 12 2008, 02:18:19)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on
darwin
Type "help", "copyright", "credits" or "license" for
more information.
>>> import boto
>>>



Explore SDB with boto

Use the SDBConnection class to provide the main interface for the interaction with SDB.
You will use boto from the Python console. The example calls different methods on the
SDBConnection object and examines the responses returned by SDB, which will help
you get familiar with the API while you explore the concepts behind SDB.

The first step is to create a connection object to SDB using the Amazon Web Services
access keys you exported earlier to your environment. The boto library always checks the
environment first to see if these variables are set. If they are set, boto automatically uses
them when it creates the connection.


Listing 4. Create a connection to SDB
>>> import boto
>>> sdb_conn = boto.connect_sdb()
>>>



For the rest of this article, you can use the sdb_conn object, created above, to interact
with SDB. You can create new domains by specifying a name for the domain.


Listing 5. Create a domain
>>> d1 = sdb_conn.create_domain('devworks-dom-1')
>>>



Retrieve a list of all your domains, which returns a result set object that is essentially a
Python list, as shown in Listing 6. You can iterate over this list and access each domain.


Listing 6. List all the domains
>>> all_domains = sdb_conn.get_all_domains()
>>>
>>> len(all_domains)
1
>>>
>>> for d in all_domains:
...     print d.name
...
devworks-dom-1



You can also retrieve a single domain by name.


Listing 7. List single domain
>>> my_domain = sdb_conn.get_domain('devworks-dom-1')
>>>
>>> print my_domain.name
devworks-dom-1



Newly created domains are, of course, empty until you add items to them. You create a
new item within a domain, then add attributes to it.


Listing 8. Create new item
>>>   my_domain = sdb_conn.get_domain('devworks-dom-1')
>>>
>>>   i1 = my_domain.new_item('test_item_1')
>>>
>>>   i1['cars'] = 'BMW'
>>>
>>>   i1['fruits'] = ['apple', 'orange', 'mango']
>>>



Items can be retrieved from a domain by specifying the item name, which must be
unique. This is similar to the concept of a primary key in a relational database.


Listing 9. Retrieve an item and its attributes
>>> my_item = my_domain.get_item('test_item_1')
>>>
>>> print my_item
{u'cars': u'BMW', u'fruits': [u'apple', u'mango',
u'orange']}
>>>



The item object returned above is a live Item object that will automatically retrieve all
attributes for this item from SDB when you access any of its attributes. Any updates
made to the values of the attributes for this item will be saved automatically to SDB.
Listing 10. Update attributes
>>> my_item['cars']
u'BMW'
>>>
>>> my_item['cars'] = 'Honda'
>>>
>>> my_item['cars']
'Honda'
>>>



You can also retrieve items and attributes by using the SDBConnection class and
specifying the domain and item names.


Listing 11. Retrieve an item using SDBConnection
>>>
>>> sdb_conn.get_attributes('devworks-
dom-1','test_item_1')
{u'cars': u'Honda', u'fruits': [u'apple', u'mango',
u'orange']}
>>>



An item is automatically deleted by SDB if it does not have any attributes. You can also
specifically delete an item and its attributes.


Listing 12. Delete an item and its attributes
>>> sdb_conn.get_attributes('devworks-
dom-1','test_item_1')
{u'cars': u'Honda', u'fruits': [u'apple', u'mango',
u'orange']}
>>>
>>> sdb_conn.delete_attributes('devworks-
dom-1','test_item_1')
True
>>> sdb_conn.get_attributes('devworks-
dom-1','test_item_1')
{}
>>>



Listing 13. Delete a domain
>>> sdb_conn.delete_domain('devworks-dom-1')
True
>>>



Querying SDB domains
To search your structured data, SDB provides a custom query language that contains
attribute name-value pairs associated with items. The basic component when building up
a query expression is called a predicate. Each predicate is delineated by a square bracket
that surrounds an attribute, a comparison operator, and a value to compare. For example,
a predicate (such as ['desc' = 'Hello Devworks']) defines an equality comparison on
the attribute desc. Each predicate is evaluated separately and produces a set of item
names. You can combine multiple predicates using set operations like union and
intersection to build complex queries.

When using predicates in your queries, it's important to consider that all predicate
comparisons are performed lexicographically by SDB. You must ensure that your data is
stored in attributes using the appropriate string representation. Keep in mind that queries
taking longer than 5 seconds will be automatically aborted by SDB.


Listing 14. Create some test data
>>>   d2 = sdb_conn.create_domain('devworks-dom-2')
>>>
>>>   i1 = d2.new_item('car1')
>>>
>>>   i1['make']= 'BMW'
>>>   i1['color']='grey'
>>>   i1['year']='2008'
>>>   i1['desc']='Sedan'
>>>   i1['model']='530i'
>>>
>>>   i2 = d2.new_item('car2')
>>>
>>>   i2['make']= 'BMW'
>>>   i2['color']='white'
>>>   i2['year']='2007'
>>>   i2['desc']='Sports Utility Vehicle'
>>>   i2['model']='X5'
>>>



Listing 15. Query with a single predicate
>>> rs = d2.query("['make' = 'BMW']")
>>> for result in rs:
...     print result.name
...
car1
car2
>>>



Listing 16. Query with multiple predicates
>>> rs = d2.query("['make' = 'BMW'] intersection
['year' = '2007']")
>>> for result in rs:
...     print result.name
...
car2
>>>



The query language provides support for a variety of comparison operators. It lets you
perform range queries and multi-valued attribute queries. To get a good grasp of all the
possibilities and best practices for creating queries and fine-tuning them for best
performance, it's highly recommended that you review the introductory articles on the
query language provided by Amazon Web Services.

You can also retrieve the metadata for a domain that gives you the total number of items
in the domain (in addition to other data).


Listing 17. Metadata for a domain
>>> my_domain = sdb_conn.get_domain('devworks-dom-2')
>>>
>>> my_metadata = my_domain.get_metadata()
>>>
>>> print my_metadata.item_count
2
>>> print my_metadata.item_names_size
8
>>> print my_metadata.attr_value_count
10
>>> print my_metadata.attr_names_size
22
>>> print my_metadata.attr_values_size
56
>>> print my_metadata.timestamp
1231798889
>>>




Conclusion

This article introduced you to Amazon's SDB service. You learned some of the basic
concepts and explored some of the functions provided by boto, an open source Python
library for interacting with SDB.

Contenu connexe

En vedette

J2 Se 5.0 Name And Version Change
J2 Se 5.0 Name And Version ChangeJ2 Se 5.0 Name And Version Change
J2 Se 5.0 Name And Version Changewhite paper
 
Cloud Computing: Latest Buzzword or Glimpse of the Future?
Cloud Computing: Latest Buzzword or Glimpse of the Future?Cloud Computing: Latest Buzzword or Glimpse of the Future?
Cloud Computing: Latest Buzzword or Glimpse of the Future?white paper
 
Impact of Social Media
Impact of Social MediaImpact of Social Media
Impact of Social Mediawhite paper
 
Digital marketing week 2
Digital marketing week 2Digital marketing week 2
Digital marketing week 2Mayanka Singh
 
Java Standard Edition 6 Performance
Java Standard Edition 6 PerformanceJava Standard Edition 6 Performance
Java Standard Edition 6 Performancewhite paper
 
Automotive LIDAR with SensL SiPM Sensors
Automotive LIDAR with SensL SiPM SensorsAutomotive LIDAR with SensL SiPM Sensors
Automotive LIDAR with SensL SiPM Sensorsjcjacks
 
Offline risk analysis
Offline risk analysisOffline risk analysis
Offline risk analysisshivaindia
 
Resume Dr. Farooq Ahmad (CRO)
Resume Dr. Farooq Ahmad (CRO)Resume Dr. Farooq Ahmad (CRO)
Resume Dr. Farooq Ahmad (CRO)Dr.Farooq Ahmad
 
Amazon cloud
Amazon cloud Amazon cloud
Amazon cloud Narendra
 

En vedette (12)

J2 Se 5.0 Name And Version Change
J2 Se 5.0 Name And Version ChangeJ2 Se 5.0 Name And Version Change
J2 Se 5.0 Name And Version Change
 
Cloud Computing: Latest Buzzword or Glimpse of the Future?
Cloud Computing: Latest Buzzword or Glimpse of the Future?Cloud Computing: Latest Buzzword or Glimpse of the Future?
Cloud Computing: Latest Buzzword or Glimpse of the Future?
 
Impact of Social Media
Impact of Social MediaImpact of Social Media
Impact of Social Media
 
Digital marketing week 2
Digital marketing week 2Digital marketing week 2
Digital marketing week 2
 
Java Standard Edition 6 Performance
Java Standard Edition 6 PerformanceJava Standard Edition 6 Performance
Java Standard Edition 6 Performance
 
Automotive LIDAR with SensL SiPM Sensors
Automotive LIDAR with SensL SiPM SensorsAutomotive LIDAR with SensL SiPM Sensors
Automotive LIDAR with SensL SiPM Sensors
 
Borusu Ramanjaneyulu
Borusu RamanjaneyuluBorusu Ramanjaneyulu
Borusu Ramanjaneyulu
 
Sandeep_Kadoor_Resume
Sandeep_Kadoor_ResumeSandeep_Kadoor_Resume
Sandeep_Kadoor_Resume
 
Yuhung's Resume
Yuhung's ResumeYuhung's Resume
Yuhung's Resume
 
Offline risk analysis
Offline risk analysisOffline risk analysis
Offline risk analysis
 
Resume Dr. Farooq Ahmad (CRO)
Resume Dr. Farooq Ahmad (CRO)Resume Dr. Farooq Ahmad (CRO)
Resume Dr. Farooq Ahmad (CRO)
 
Amazon cloud
Amazon cloud Amazon cloud
Amazon cloud
 

Plus de white paper

Secure Computing With Java
Secure Computing With JavaSecure Computing With Java
Secure Computing With Javawhite paper
 
Java Security Overview
Java Security OverviewJava Security Overview
Java Security Overviewwhite paper
 
Platform Migration Guide
Platform Migration GuidePlatform Migration Guide
Platform Migration Guidewhite paper
 
Java Standard Edition 5 Performance
Java Standard Edition 5 PerformanceJava Standard Edition 5 Performance
Java Standard Edition 5 Performancewhite paper
 
Java Standard Edition 6 Performance
Java Standard Edition 6 PerformanceJava Standard Edition 6 Performance
Java Standard Edition 6 Performancewhite paper
 
Java Standard Edition 6 Performance
Java Standard Edition 6 PerformanceJava Standard Edition 6 Performance
Java Standard Edition 6 Performancewhite paper
 
Java Standard Edition 6 Performance
Java Standard Edition 6 PerformanceJava Standard Edition 6 Performance
Java Standard Edition 6 Performancewhite paper
 
Java Standard Edition 6 Performance
Java Standard Edition 6 PerformanceJava Standard Edition 6 Performance
Java Standard Edition 6 Performancewhite paper
 
Memory Management in the Java HotSpot Virtual Machine
Memory Management in the Java HotSpot Virtual MachineMemory Management in the Java HotSpot Virtual Machine
Memory Management in the Java HotSpot Virtual Machinewhite paper
 
Java Tuning White Paper
Java Tuning White PaperJava Tuning White Paper
Java Tuning White Paperwhite paper
 
Java Apis For Imaging Enterprise-Scale, Distributed 2d Applications
Java Apis For Imaging Enterprise-Scale, Distributed 2d ApplicationsJava Apis For Imaging Enterprise-Scale, Distributed 2d Applications
Java Apis For Imaging Enterprise-Scale, Distributed 2d Applicationswhite paper
 
Introduction to the Java(TM) Advanced Imaging API
Introduction to the Java(TM) Advanced Imaging APIIntroduction to the Java(TM) Advanced Imaging API
Introduction to the Java(TM) Advanced Imaging APIwhite paper
 
Java 2D API: Enhanced Graphics and Imaging for the Java Platform
Java 2D API: Enhanced Graphics and Imaging for the Java PlatformJava 2D API: Enhanced Graphics and Imaging for the Java Platform
Java 2D API: Enhanced Graphics and Imaging for the Java Platformwhite paper
 
Concurrency Utilities Overview
Concurrency Utilities OverviewConcurrency Utilities Overview
Concurrency Utilities Overviewwhite paper
 
Defining a Summative Usability Test for Voting Systems
Defining a Summative Usability Test for Voting SystemsDefining a Summative Usability Test for Voting Systems
Defining a Summative Usability Test for Voting Systemswhite paper
 
Usability Performance Benchmarks
Usability Performance BenchmarksUsability Performance Benchmarks
Usability Performance Benchmarkswhite paper
 
The Effect of Culture on Usability
The Effect of Culture on UsabilityThe Effect of Culture on Usability
The Effect of Culture on Usabilitywhite paper
 
Principles of Web Usability I - Summer 2006
Principles of Web Usability I - Summer 2006Principles of Web Usability I - Summer 2006
Principles of Web Usability I - Summer 2006white paper
 
Principles of Web Usabilty II - Fall 2007
Principles of Web Usabilty II - Fall 2007 Principles of Web Usabilty II - Fall 2007
Principles of Web Usabilty II - Fall 2007 white paper
 

Plus de white paper (20)

Secure Computing With Java
Secure Computing With JavaSecure Computing With Java
Secure Computing With Java
 
Java Security Overview
Java Security OverviewJava Security Overview
Java Security Overview
 
Platform Migration Guide
Platform Migration GuidePlatform Migration Guide
Platform Migration Guide
 
Java Standard Edition 5 Performance
Java Standard Edition 5 PerformanceJava Standard Edition 5 Performance
Java Standard Edition 5 Performance
 
Java Standard Edition 6 Performance
Java Standard Edition 6 PerformanceJava Standard Edition 6 Performance
Java Standard Edition 6 Performance
 
Java Standard Edition 6 Performance
Java Standard Edition 6 PerformanceJava Standard Edition 6 Performance
Java Standard Edition 6 Performance
 
Java Standard Edition 6 Performance
Java Standard Edition 6 PerformanceJava Standard Edition 6 Performance
Java Standard Edition 6 Performance
 
Java Standard Edition 6 Performance
Java Standard Edition 6 PerformanceJava Standard Edition 6 Performance
Java Standard Edition 6 Performance
 
Memory Management in the Java HotSpot Virtual Machine
Memory Management in the Java HotSpot Virtual MachineMemory Management in the Java HotSpot Virtual Machine
Memory Management in the Java HotSpot Virtual Machine
 
Java Web Start
Java Web StartJava Web Start
Java Web Start
 
Java Tuning White Paper
Java Tuning White PaperJava Tuning White Paper
Java Tuning White Paper
 
Java Apis For Imaging Enterprise-Scale, Distributed 2d Applications
Java Apis For Imaging Enterprise-Scale, Distributed 2d ApplicationsJava Apis For Imaging Enterprise-Scale, Distributed 2d Applications
Java Apis For Imaging Enterprise-Scale, Distributed 2d Applications
 
Introduction to the Java(TM) Advanced Imaging API
Introduction to the Java(TM) Advanced Imaging APIIntroduction to the Java(TM) Advanced Imaging API
Introduction to the Java(TM) Advanced Imaging API
 
Java 2D API: Enhanced Graphics and Imaging for the Java Platform
Java 2D API: Enhanced Graphics and Imaging for the Java PlatformJava 2D API: Enhanced Graphics and Imaging for the Java Platform
Java 2D API: Enhanced Graphics and Imaging for the Java Platform
 
Concurrency Utilities Overview
Concurrency Utilities OverviewConcurrency Utilities Overview
Concurrency Utilities Overview
 
Defining a Summative Usability Test for Voting Systems
Defining a Summative Usability Test for Voting SystemsDefining a Summative Usability Test for Voting Systems
Defining a Summative Usability Test for Voting Systems
 
Usability Performance Benchmarks
Usability Performance BenchmarksUsability Performance Benchmarks
Usability Performance Benchmarks
 
The Effect of Culture on Usability
The Effect of Culture on UsabilityThe Effect of Culture on Usability
The Effect of Culture on Usability
 
Principles of Web Usability I - Summer 2006
Principles of Web Usability I - Summer 2006Principles of Web Usability I - Summer 2006
Principles of Web Usability I - Summer 2006
 
Principles of Web Usabilty II - Fall 2007
Principles of Web Usabilty II - Fall 2007 Principles of Web Usabilty II - Fall 2007
Principles of Web Usabilty II - Fall 2007
 

Dernier

Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in managementchhavia330
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetDenis Gagné
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insightsseribangash
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 

Dernier (20)

Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in management
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insights
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 

Cloud Computing With Amazon Web Services, Part 5 Dataset Processing In The Cloud With Simple Db

  • 1. Cloud computing with Amazon Web Services, Part 5: Dataset processing in the cloud with SimpleDB Amazon SimpleDB Amazon SDB is a fast, scalable real-time dataset indexing and querying framework that makes it easy to store and retrieve structured data for your Amazon Web Services-based applications. It's designed to work well with the other Amazon Web Services, such as Elastic Compute Cloud (EC2) and Simple Storage Service (S3). SDB enables you to build your entire application stack within the Amazon Web Services environment. You pay for the service based entirely upon your usage. There is also a free tier of service available. Some valuable features provided by SDB: Reliability SDB is designed to store your indexed data redundantly across multiple data centers and to make them available at all times. Speed SDB is designed to provide quick retrieval of data, especially if your requests are made from within the Amazon Web Services environment from an EC2 instance. Simplicity The programming model for accessing and using SDB is simple and can be used from a variety of programming languages. Security SDB is designed to provide a high level of security. Access to the data is restricted to authorized users. Flexibility SDB gives you the ability to store data on the fly without any need for pre-defined schemas. Inexpensive SDB charges are quite economical. You are only charged for what you actually use. This rest of this section explores the concepts that underpin SDB. Domains A domain is a container that lets you store your structured data and run queries against it. The data is stored in a domain as items. Conceptually, a domain is similar to a worksheet tab in a spreadsheet; items are rows in the spreadsheet. You can run queries against a domain, but you cannot yet query across domains in the current version of SDB. Each domain has the following metadata associated with it: • Date and time the metadata was last updated
  • 2. Number of all items in the domain • Number of all attribute name-value pairs in the domain • Number of unique attribute names in the domain • Total size of all item names in the domain, in bytes • Total size of all attribute values, in bytes • Total size of all unique attribute names, in bytes SDB, like Simple Queue Service (SQS), follows the "eventual consistency" model. SDB maintains multiple copies of each domain for fault tolerance. Every change made to a domain is propagated across all copies. Amazon CTO Werner Vogels discusses the reasoning behind the concept of eventual consistency on his blog. Because this operation sometimes takes a few seconds, depending on system load and network latency, a consumer of your domain may not see the changes immediately. Changes will eventually be propagated throughout SDB, but this delay is an important consideration when designing your SDB-based applications. Items Items represent individual objects within your domains, and they contain attributes with values. Each item is conceptually similar to a row in a spreadsheet — an attribute is a column and the values are cells. Attributes are not restricted to single values and can even have multiple values. SDB automatically indexes your domains regardless of how the data is structured. SDB also has a time limit for executing any single query against your domains. If a query takes longer than 5 seconds, SDB will stop the query and return an error. Domains in SDB are flexible and don't have any fixed schemas. Each item within a domain can contain a unique set of up to 256 attributes. The attributes can even be completely different from all other attributes for the other items within that domain. Limitations The current version of SDB has limitations that you should consider when designing your application. Table 1 shows the limitations (as specified by Amazon in its latest documentation). Table 1. Current limitations Parameter Current restrictions Domain size 10 GB per domain 250,000,000 attribute name-value pairs 3-255 characters (a-z, A-Z, 0-9, '_', '-', and '.')
  • 3. Domains per Amazon Web 100 Services account Attributes Name-value pairs per item is 256. Name length is 1024 bytes. Value length is 1024 bytes. Only allowed characters are UTF-8 characters that are valid in XML documents. Control characters and any sequences that are not valid in XML are not allowed. Per PutAttributes operation limited to 100 Requested per Select or QueryWithAttributes operation limited to 256. Maximum items in query 256 response Maximum query execution time 5 seconds Maximum predicates per query 10 expression Maximum comparisons per 10 query expression predicate Maximum number of unique 20 attributes per select expression Maximum number of 20 comparisons per select expression Maximum response size for 1 MB QueryWithAttributes and Select Pricing Amazon provides a free tier for SDB, along with pricing for usage above the free tier limit. The charges are based on: • The machine usage of each SDB request. • The amount of machine capacity used for completing the specified request, normalized to the hourly capacity of a 1.7-GHz Xeon processor. Free tier There are no charges on the first 25 machine hours, 1 GB of data transfer, and 1 GB of storage that you consume every month, at least until 1 Jun 2009. This is a significant amount of usage being provided for free for a limited time by Amazon. Many types of applications can operate very easily within this free tier. Table 2 shows example pricing.
  • 4. Table 2. Pricing for machine utilization Quantity Cost First 25 machine hours Free Additional machine hours $0.14 per machine hour Table 3 addresses the amount of data transferred to and from SDB. There is no charge for data transferred between SDB and other Amazon Web Services within the same region. Data transferred between SDB and other Amazon Web Services across regions will be charged at Internet Data Transfer rates on both sides of the transfer. Table 3. Pricing for data transfer Type of Cost transfer All data First 1 GB of data transfer in is free transfer $0.100 per GB — all additional data transfer in First 1 GB of data transfer out is free $0.170 per GB — first 10 TB/month data transfer out $0.130 per GB — next 40 TB/month data transfer out $0.110 per GB — next 100 TB/month data transfer out $0.100 per GB — data transfer out / month over 150 TB Table 4 outlines costs for structured data storage. Table 4. Structured data storage Amount of Cost storage All data storage First 1GB of data is free. $0.25 per GB /month - all additional data storage For the latest pricing, check Amazon SDB. You can also use the Simple Monthly Calculator provided by Amazon for calculating your monthly usage costs for SDB and the other Amazon Web Services. Getting started with SDB
  • 5. To start exploring SDB, you need to sign up for an Amazon Web Services account (see Resources). See Part 2 of this series for detailed instructions on signing up for Amazon Web Services. Once you have an Amazon Web Services account, you must enable Amazon SDB service for your account: 1. Log in to your Amazon Web Services account. 2. Navigate to the SDB home page. 3. Click Sign Up For This Web Service on the right side. 4. Provide the requested information and complete the sign-up process. All communication with any of the Amazon Web Services is through either the SOAP interface or the query interface. In this article, you use the query interface via a third- party library to communicate with SDB. You will need to obtain your access keys, which you can access from your Web Services Account information page by selecting View Access Key Identifiers. You are now set up to use Amazon Web Services and have enabled SDB service for your account. Interacting with SDB For this example, you use a third-party open source Python library named boto to become familiar with SDB by running small snippets of code in a Python shell. Install boto and set up your environment Download boto. The latest version, as of the writing of this article, was 1.6b. Unzip the archive to the directory of your choice. Change into this directory and run setup.py to install boto into your local Python environment, as shown in Listing 1. Listing 1. Install boto $ cd directory_where_you_unzipped_boto $ python setup.py install Set up some environment variables to point to the Amazon Web Services access keys. The access keys are available from the Web Services Account information. Listing 2. Set up environment variables # Export variables with your AWS access keys $ export AWS_ACCESS_KEY_ID=Your_AWS_Access_Key_ID $ export AWS_SECRET_ACCESS_KEY=Your_AWS_Secret_Access_Key
  • 6. Check to make sure everything is set up correctly by starting a Python shell and importing the boto library, as shown in Listing 3. Listing 3. Check the setup $ python Python 2.4.5 (#1, Apr 12 2008, 02:18:19) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import boto >>> Explore SDB with boto Use the SDBConnection class to provide the main interface for the interaction with SDB. You will use boto from the Python console. The example calls different methods on the SDBConnection object and examines the responses returned by SDB, which will help you get familiar with the API while you explore the concepts behind SDB. The first step is to create a connection object to SDB using the Amazon Web Services access keys you exported earlier to your environment. The boto library always checks the environment first to see if these variables are set. If they are set, boto automatically uses them when it creates the connection. Listing 4. Create a connection to SDB >>> import boto >>> sdb_conn = boto.connect_sdb() >>> For the rest of this article, you can use the sdb_conn object, created above, to interact with SDB. You can create new domains by specifying a name for the domain. Listing 5. Create a domain >>> d1 = sdb_conn.create_domain('devworks-dom-1') >>> Retrieve a list of all your domains, which returns a result set object that is essentially a Python list, as shown in Listing 6. You can iterate over this list and access each domain. Listing 6. List all the domains >>> all_domains = sdb_conn.get_all_domains()
  • 7. >>> >>> len(all_domains) 1 >>> >>> for d in all_domains: ... print d.name ... devworks-dom-1 You can also retrieve a single domain by name. Listing 7. List single domain >>> my_domain = sdb_conn.get_domain('devworks-dom-1') >>> >>> print my_domain.name devworks-dom-1 Newly created domains are, of course, empty until you add items to them. You create a new item within a domain, then add attributes to it. Listing 8. Create new item >>> my_domain = sdb_conn.get_domain('devworks-dom-1') >>> >>> i1 = my_domain.new_item('test_item_1') >>> >>> i1['cars'] = 'BMW' >>> >>> i1['fruits'] = ['apple', 'orange', 'mango'] >>> Items can be retrieved from a domain by specifying the item name, which must be unique. This is similar to the concept of a primary key in a relational database. Listing 9. Retrieve an item and its attributes >>> my_item = my_domain.get_item('test_item_1') >>> >>> print my_item {u'cars': u'BMW', u'fruits': [u'apple', u'mango', u'orange']} >>> The item object returned above is a live Item object that will automatically retrieve all attributes for this item from SDB when you access any of its attributes. Any updates made to the values of the attributes for this item will be saved automatically to SDB.
  • 8. Listing 10. Update attributes >>> my_item['cars'] u'BMW' >>> >>> my_item['cars'] = 'Honda' >>> >>> my_item['cars'] 'Honda' >>> You can also retrieve items and attributes by using the SDBConnection class and specifying the domain and item names. Listing 11. Retrieve an item using SDBConnection >>> >>> sdb_conn.get_attributes('devworks- dom-1','test_item_1') {u'cars': u'Honda', u'fruits': [u'apple', u'mango', u'orange']} >>> An item is automatically deleted by SDB if it does not have any attributes. You can also specifically delete an item and its attributes. Listing 12. Delete an item and its attributes >>> sdb_conn.get_attributes('devworks- dom-1','test_item_1') {u'cars': u'Honda', u'fruits': [u'apple', u'mango', u'orange']} >>> >>> sdb_conn.delete_attributes('devworks- dom-1','test_item_1') True >>> sdb_conn.get_attributes('devworks- dom-1','test_item_1') {} >>> Listing 13. Delete a domain >>> sdb_conn.delete_domain('devworks-dom-1') True >>> Querying SDB domains
  • 9. To search your structured data, SDB provides a custom query language that contains attribute name-value pairs associated with items. The basic component when building up a query expression is called a predicate. Each predicate is delineated by a square bracket that surrounds an attribute, a comparison operator, and a value to compare. For example, a predicate (such as ['desc' = 'Hello Devworks']) defines an equality comparison on the attribute desc. Each predicate is evaluated separately and produces a set of item names. You can combine multiple predicates using set operations like union and intersection to build complex queries. When using predicates in your queries, it's important to consider that all predicate comparisons are performed lexicographically by SDB. You must ensure that your data is stored in attributes using the appropriate string representation. Keep in mind that queries taking longer than 5 seconds will be automatically aborted by SDB. Listing 14. Create some test data >>> d2 = sdb_conn.create_domain('devworks-dom-2') >>> >>> i1 = d2.new_item('car1') >>> >>> i1['make']= 'BMW' >>> i1['color']='grey' >>> i1['year']='2008' >>> i1['desc']='Sedan' >>> i1['model']='530i' >>> >>> i2 = d2.new_item('car2') >>> >>> i2['make']= 'BMW' >>> i2['color']='white' >>> i2['year']='2007' >>> i2['desc']='Sports Utility Vehicle' >>> i2['model']='X5' >>> Listing 15. Query with a single predicate >>> rs = d2.query("['make' = 'BMW']") >>> for result in rs: ... print result.name ... car1 car2 >>> Listing 16. Query with multiple predicates >>> rs = d2.query("['make' = 'BMW'] intersection ['year' = '2007']") >>> for result in rs: ... print result.name ...
  • 10. car2 >>> The query language provides support for a variety of comparison operators. It lets you perform range queries and multi-valued attribute queries. To get a good grasp of all the possibilities and best practices for creating queries and fine-tuning them for best performance, it's highly recommended that you review the introductory articles on the query language provided by Amazon Web Services. You can also retrieve the metadata for a domain that gives you the total number of items in the domain (in addition to other data). Listing 17. Metadata for a domain >>> my_domain = sdb_conn.get_domain('devworks-dom-2') >>> >>> my_metadata = my_domain.get_metadata() >>> >>> print my_metadata.item_count 2 >>> print my_metadata.item_names_size 8 >>> print my_metadata.attr_value_count 10 >>> print my_metadata.attr_names_size 22 >>> print my_metadata.attr_values_size 56 >>> print my_metadata.timestamp 1231798889 >>> Conclusion This article introduced you to Amazon's SDB service. You learned some of the basic concepts and explored some of the functions provided by boto, an open source Python library for interacting with SDB.