2. Contents
• Introduction
• Cloud Computing platforms
• Programming for the Cloud
• Semantic Web on the Cloud
Cloud Computing Apr 2010 #2
3. Contents
Part I
Introduction
Cloud Computing Apr 2010 #3
4. Cloud Computing - NIST definition
• “Cloud computing is a model for enabling ubiquitous, convenient, on-
demand network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications, and services)
that can be rapidly provisioned and released with minimal management
effort or service provider interaction.”
• Delivery models
– IaaS (Infrastructure as a Service) - the consumer uses "fundamental
resources" such as processing power, storage, networking components or
middleware. The consumer can control the operating system, storage,
applications and possibly networking
– PaaS (Platform as a Service) - the consumer uses a hosting environment for
their applications and has control over the applications (and some control
over the hosting environment), but does not control the infrastructure on
which they are running
– SaaS (Software as a Service) - the consumer uses an application, but does not
control the infrastructure on which it's running (OS, hardware)
Cloud Computing Apr 2010 #4
6. Cloud Computing - Essential characteristics
(NIST)
• Rapid elasticity – the ability to scale resources both up and down as
needed. To the consumer, the Cloud appears to be infinite, and the
consumer can purchase as much / little computing power as they need
• Measured service – aspects of the Cloud service are controlled and
monitored by the Cloud provider. This is crucial for billing, access control,
resource optimization & capacity planning
• On-demand self service – a consumer can use cloud services as needed
without any human interaction with the cloud provider
• Ubiquitous network access – the Cloud provider’s capabilities are
available over the network and can be accessed through standard
mechanisms
• Resource Pooling – allows a Cloud provider to serve its consumers via a
multi-tenant model - resources are (re)assigned according to consumer
demand.
Cloud Computing Apr 2010 #6
7. Cloud Computing - deployment models (NIST)
• Public cloud
– Infrastructure owned by some organisation but sold to 3rd parties
– E.g. Amazon Web Services, Google AppEngine, Windows Azure
• Private cloud
– Internal infrastructure for a single organisation (on or off-premise)
– E.g. VMware vCloud, IBM Cloudburst, Microsoft Hyper-V
• Community cloud
– Infrastructure shared by several organisations, targeting a specific
community
– E.g. OpenCirrus (HP, Intel, Yahoo, KIT, CMU, …)
• Hybrid cloud
– Composition of the above
– E.g. AWS Virtual Private Cloud
Cloud Computing Apr 2010 #7
8. Cloud computing – business drivers
1. Business agility
– Faster time to market
• No major upfront commitment & investment in infrastructure
– Scalability & elasticity
• Instant on-demand provisioning
• Shifting the risk of over-/under-provisioning to the cloud provider
2. Focus
– Outsource non-core tasks to the cloud provider
3. Pay-as-you-go
– Speed up new project launching & rollout (start small, add resources
when needed)
– No need for complex planning ahead
– Turn fixed costs (CapEx) into variable costs (OpEx)
Cloud Computing Apr 2010 #8
9. Some cloud use cases
• Overflow buffer
– Avoid over-provisioning for peak loads, but just for the average load
• Seasonal business
– E.g. Wallmart has 4:1 peak-to-average ratio (source?)
• Small startups time-to-market
– Less upfront investment, more focus on core competencies
• Experimental playground
– Rollout experimental projects without major equipment purchases
• Speedup of large scale batch operations
– 1000 servers for 1 hour cost the same as 1 server for 1000 hours
– More cost-efficient computing (off-peak tariffs & time zones)
• Unforeseeable events
– E.g. sudden traffic spikes to web sites (volcanoes, anyone?) 2010
Cloud Computing Apr #9
10. Cloud-able applications
• Typical characteristics
– Non mission critical
– Need >99% uptime
– Low bandwidth / higher latency tolerance
– Relaxed security requirements
– Few integration points
– E.g
• Batch operations (speedup at the same price!)
• One-time large scale processing
• Barriers to cloud migration
– Security & trust
– Lack of SLA
– Lack of standardization (vendor lock-in)
Cloud Computing Apr 2010 #10
15. Amazon Web Services (2)
• Simple Storage Service (S3)
– Eventually consistent blob storage (SLA available)
– Max 5GB per object, REST+SOAP API
– Storage $0.15/GB/mo, transfer $0.15/GB, $0.10 per 100K API calls
• Elastic Compute Cloud (EC2)
– Xen VM, Amazon Machine Image (AMI), no SLA
• Elastic Block Storage (EBS)
– Up to 1TB storage to be used by EC2 instances (attached devices)
– Raw/unformatted block devices (create your own filesystem on top)
– Replicated
– $0.10/GB/mo, $0.10 per 1 million I/O ops (iostat)
Cloud Computing Apr 2010 #15
16. Amazon Web Services (3)
• Simple Queue Service
– Persistent, reliable, secure, distributed queue (no SLA)
– Message size 8KB, autodelete 4 days
– duplicate and out-of-order delivery may occur
– Price: $0.15/GB transfer, $0.10 per 100K API calls
• Simple Notification Service
– Reliable, secure & scalable pub/sub service (no SLA)
– Protocols: HTTP, e-mail, SQS
– Price: $0.15/GB transfer, $0.06 per 100K API calls, price per 100K
notifications: $0.06 (HTTP), $2.00 (e-mail), free (SQS)
• SimpleDB
– Distributed column store (built on Erlang)
– Consistent or eventually consistent reads, flexible schema
– $0.14/hour consumed, $0.15/GB transfer, $0.25/GB/mo storage
Cloud Computing Apr 2010 #16
17. Amazon Web Services (4)
• Relational Database Service
– MySQL (no SLA)
– Automated backup and scaling
– $0.11 to $3.10 per hour (instance type), $0.10/GB/mo storage, $0.10
per million I/O ops, $0.15/GB transfer
• Elastic MapReduce
– Based on Hadoop
– Price: EC2 instance price + premium ($0.01 - $0.42/hour)
• CloudWatch, Auto Scaling, Elastic Load Balancer
– Monitoring, auto scaling & load balancing for EC2
• Virtual Private Cloud
Cloud Computing Apr 2010 #17
18. Google AppEngine
• http://code.google.com/appengine/
• Features
– custom JVM (lots of limitations)
– servlet container, JSP
– Datastore based on BigTable (column store, consistent, C+P)
– JDO/JPA
– Google infrastructure services: URL fetch, mail
– Memcache (in-memory distributed key/value cache)
– Task queues & scheduler
– Development: local dev server, Eclipse plugins, administration
• Pricing
– traffic/GB $0.10 ($0.12); CPU/h $0.10; storage/GB/mo $0.15; e-mail
$1 per 10K
Cloud Computing Apr 2010 #18
20. Google AppEngine (3)
• Restrictions
– Applications run in a restricted JVM sandbox
• No threads, no System calls, limited reflection
– No sub-process forking
– Connections
• Outbound – only URL fetch & mail
• Inbound – only HTTP(S)
– No filesystem writes (limited read access), use datastore instead
– Limits
• Request duration – 30 sec
• Request/response size – 10 MB (datastore request/response – 1MB)
• file size – 10 MB, number of files – 3,000
• Datastore: entity size – 1 MB, property values – 1000, entities per batch -
500
Cloud Computing Apr 2010 #20
21. Google AppEngine (4)
• Datastore
– Based on BigTable, distributed column-store
• Entities and multi-valued properties
• Entities have unique key & a type (kind)
• Flexible schema Select from Person
where lastName = …
– Transactional, consistent && height < …
– JDO/JPA interface order by height desc
• Queries
– JDOQL: entity kind + property value restrictions + sort order
– Cursors can be specified (query range)
– query resultset is materialised in a predefined index
• query execution only fetches data from the existing index
• queries with same kind + property restriction operator (but different
value filler) + same sort order share the same index
Cloud Computing Apr 2010 #21
22. Windows Azure
• http://www.microsoft.com/windowsazure/
• Components
– Windows Azure
• Fabric – management & monitoring of cloud services (Hyper-V)
• Compute – hosted applications (.net, c++, java, …)
• Storage – blob storage, tables, queues (REST interface)
– SQL Azure
• Cloud based MS SQL Server
– AppFabric
• Infrastructure services, Service registry
• Access control
• Pricing
– CPU/h $0.12; storage $0.15/GB/mo, transfer $0.10 ($0.15), storage
transactions – $1 per 1 million
Cloud Computing Apr 2010 #22
24. Contents
Part III
Programming for the
Cloud
Tools & APIs
Cloud Computing Apr 2010 #24
25. Programming for the Cloud
• Amazon
– REST API
– AWS Java SDK (http://aws.amazon.com/sdkforjava/)
– AWS Toolkit for Eclipse (http://aws.amazon.com/eclipse)
– Typica (http://code.google.com/p/typica/)
– JetS3t (S3 only) http://jets3t.s3.amazonaws.com/index.html
• Google AppEngine
– AppEngine SDK (dev server, admin tools, Eclipse plugins)
– Datastore: JDO, JPA, low-level Java API
– Memcache: JCache + low level Java API
– URL fetch: java.net + low level Java API
– Mail: java.mail + low level Java API
– Task queue, blob store, accounts: low level APIs
Cloud Computing Apr 2010 #25
26. Programming for the Cloud (2)
• jClouds
– http://code.google.com/p/jclouds/
– Cloud interoperability framework (AWS, Google AppEngine*,
Windows Azure, GoGrid)
– Mostly storage oriented functionality
• Eucalyptus
– http://www.eucalyptus.com/
– Open source private cloud infrastructure
– AWS compatible (EC2, EBS, S3) (C) Eucalyptus Inc.
– Cross-hypervisor support
Cloud Computing Apr 2010 #26
27. Don’t forget…
• Deploying on EC2 requires minimal to no modifications of
existing software
• EC2 has some big machines: 70GB RAM / 8 CPU cores
• 1,000 servers for 1hr cost the same as 1 server for 1,000hrs
• Data traffic (in/out) of the Cloud can be expensive
• Storage relatively cheap
• Internal cloud traffic is free (AWS), e.g. accessing other
applications/datasets on the Cloud
• CPU price: uptime (EC2) vs. computing cycles (AppEngine)
• EC2 spot instances (off-peak hours) are very, very cheap!
Cloud Computing Apr 2010 #27
28. Contents
Part IV
Semantic Web on the
Cloud
Cloud Computing Apr 2010 #28
29. Semantic Web on the Cloud
• Public Data Sets on AWS
– A lot of datasets hosted for free by Amazon
• Freebase, UniGene, US Census, …
– New data sets can be submitted too (after approval)
– Full LOD cloud still not available (due to licensing issues)
• SaaS
– Virtuoso (AWS hosted), OpenCalais, …
• “Semantic Cloud” initiatives (cloud interoperability & data
integration)
– E.g. fluidOps - Management & provisioning of semantic applications
(SaaS) and datasources (DaaS) on the Cloud
• Semantic Web apps as virtual appliances on the Cloud
• LOD data sources as virtual resources on the Cloud (“Self-service”
paradigm)
Cloud Computing Apr 2010 #29
30. Unified Cloud Computing
• http://code.google.com/p/unifiedcloud/
• Uses RDF for cloud data interoperability
Cloud Computing Apr 2010 #30
31. Useful and useless links
• http://groups.google.com/group/cloud-computing
• “An Essential Guide to Possibilities and Risks of Cloud
Computing”
• “Talking To Your CFO About Cloud Computing”
• Nick Carr @ Atmosphere’2009
• Introducing the Windows Azure platform
Cloud Computing Apr 2010 #31