My talk on MetaCDN for the Cloudslam 2009 virtual conference.
Many 'Cloud Storage' providers have launched in the last two years, providing internet accessible data storage and delivery in several continents that is backed by rigorous Service Level Agreements (SLAs), guaranteeing specific performance and uptime targets. The facilities offered by these providers is leveraged by developers via provider-specific Web Service APIs. For content creators, these providers have emerged as a genuine alternative to dedicated Content Delivery Networks (CDNs) for global file storage and delivery, as they are significantly cheaper, have comparable performance and no ongoing contract obligations. As a result, the idea of utilising Storage Clouds as a 'poor mans' CDN is very enticing. However, many of these 'Cloud Storage' providers are merely basic storage services, and do not offer the capabilities of a fully-featured CDN such as intelligent replication, failover, load redirection and load balancing. Furthermore, they can be difficult to use for non-developers, as each service is best utilised via unique web services or programmer APIs. In this presentation, we describe the design, architecture, implementation and user-experience of MetaCDN, a system that integrates these 'Cloud Storage' providers into an unified CDN service that provides high performance, low cost, geographically distributed content storage and delivery for content creators. MetaCDN harnesses the power of 'Cloud Storage' for novices and seasoned users alike, offering an easy to use web portal and a sophisticated Web Service API.
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via the Cloud
1. MetaCDN: Enabling High
Performance, Low Cost
Content Storage and Delivery
via the Cloud
Dr. James Broberg (brobergj@csse.unimelb.edu.au)
http://www.csse.unimelb.edu.au/~brobergj
http://www.metacdn.org
2. Content Delivery
Networks (CDNs)
• What is a CDN?
• Content Delivery Networks (CDNs) such as
Akamai, Mirror Image and Limelight place web
server clusters in numerous geographical
locations to improve the responsiveness and
locality of the content it hosts for end-users.
3. Existing CDN providers
• Akamai is the clear leader in coverage and
market share (approx. 80%), however...
• Price is prohibitive for SME, NGO, Gov...
• Anecdotally 2−15 times more expensive
than Cloud Storage, and require 1−2 year
commitments and min. data use (10TB+)
• Academic CDNs include Coral, Codeen,
Globule, however...
• No SLA / QoS provided, only ‘best effort’
4. Major CDN pricing
• Most major CDN providers do not publish
prices
• “Average” prices from the 4-5 major CDNs
in the market:
• 50TB: $0.40 - $0.50 per GB delivered
• 100TB: $0.25 - $0.45 per GB delivered
• 250TB: $0.10 - $0.30 per GB delivered
• Information taken from Dan Rayburn @
www.cdnpricing.com, StreamingMedia.com
7. Feature Comparison
Nirvanix Amazon Mosso
Amazon S3
Global SDN CloudFront CloudFiles
99.9 99-99.9 99-99.9 99.9
SLA
256GB 5GB 5GB 5GB
Max File Size
Yes Yes Yes Yes
US PoP
EU PoP Yes Yes Yes Yes
Asia PoP Yes No Yes Yes
AUS PoP No No No Yes
Yes Yes Yes Yes
Per file ACL
Yes No Yes Yes
Automatic replication
of files
Yes Yes Yes Yes
Developer API’s /
Web Services
10. Introducing MetaCDN
• What if we could create a low-cost, high performance overlay
CDN using these storage clouds?
• MetaCDN harnesses the diversity of multiple storage cloud
providers, offering a choice of deployment location, features
and pricing.
• MetaCDN provides a uniform method / API for harnessing
multiple storage clouds.
• MetaCDN provides a single namespace to cover all supported
providers, making it simple to integrate into origin sites, and
handles load balancing for end-users.
• The MetaCDN system offers content creators (who are not
necessarily programmers!) a trivial way to harness the power
of multiple cloud storage providers via a web portal
11. How MetaCDN works
Amazon S3 & Mosso Cloud Microsoft Azure
Nirvanix SDN Coral CDN
CloudFront Files CDN Storage Service
JetS3t toolkit Nirvanix SDK Cloud Files SDK CoralConnector AzureConnector
Java SDK Java SDK Java SDK Java stub Java stub
Open Source Nirvanix, Inc Mosso, Inc MetaCDN.org MetaCDN.org
AmazonS3Connector NirvanixConnector CloudFilesConnector
Java stub Java stub Java stub Shared/Private
MetaCDN.org MetaCDN.org MetaCDN.org Host
MetaCDN
WebDAVConnector
Java stub
MetaCDN MetaCDN QoS MetaCDN MetaCDN
MetaCDN.org
Manager Monitor Allocator Database
SCPConnector
Java stub
Web Portal Load Redirector Web Service MetaCDN.org
Java (JSF/EJB) based portal Random redirection SOAP Web Service FTPConnector
Support HTTP POST Geographical redirection RESTful Web Service Java stub
New/view/modify deployment Least cost redirection Programmatic access MetaCDN.org
13. MetaCDN Allocator
• The MetaCDN Allocator allows users to
deploy files either directly or from a public
origin URL, with the following options:
• Maximise coverage and performance
• Deploy content in specific locations
• Cost optimised deployment
• Quality of Service (QoS) optimised
14. MetaCDN Manager
• The MetaCDN Manager ensures that:
• All current deployments are meeting QoS
targets (where applicable)
• Replicas are removed when no longer
required (minimising cost)
• A users’ budget has not been exceeded, by
tracking usage (i.e. storage/download)
15. MetaCDN QoS Monitor
• The MetaCDN QoS Monitor:
• Tracks the performance of participating
providers at all times
• Monitors and records throughput, response
time and uptime from a variety of locations
• Ensures that upstream providers are
meeting their Service Level Agreements
16. MetaCDN Database
1 0:M 1 1
MetaCDN CDN CDN
has for
User Credentials Provider
1
1 1
hosted
has has
by
M
1 M
M 1
1 0:M MetaCDN Coverage
deployed hosted
Content
Replica locations
as at
1
M
QoS Monitor measures
17. MetaCDN Web Portal
• Developed using Java Enterprise and Java
Server Faces (JSF) technologies
• JPA/MySQL back-end to store persistent data
• Web portal acts as the entry point to the
system and application-level load balancer
• Most suited for small or ad-hoc deployments,
and especially useful for less technically
inclined content creators.
18. Don't have a MetaCDN account? Sign in
Register Username: master
Password: ••••••
Login
All trademarks mentioned herein are the exclusive property of their respective owners.
This project is supported by the:
19. Register new MetaCDN account:
Username: Full Name:
joebloggs Joe Bloggs
Password: Email:
•••••• joe@blogs.com
Preferred Providers: Nirvanix SDN Amazon S3
Mosso Cloud Files Microsoft Azure Storage Service
Shared/Private Host
Microsoft Azure Storage Service Shared/Private Host
Nirvanix SDN Amazon S3 Mosso Cloud Files
Enter your Amazon S3 Credentials:
******************************
AWS Access Key:
******************************
AWS Secret Key:
Enable CloudFront:
Register Cancel
All trademarks mentioned herein are the exclusive property of their respective owners.
This project is supported by the:
20. Welcome to the MetaCDN Portal!
Deploy content View existing content Deployment Map
View existing replicas MetaCDN Analytics
Logout
Edit Account
All trademarks mentioned herein are the exclusive property of their respective owners.
This project is supported by the:
21. Sideload Content:
Choose URL: http://pitchfork.com/media/frontend/images/header_logo.gif
North America Europe
Deployment region(s):
Asia Australasia
Host Until: 30/05/2009
Deploy Cancel
All trademarks mentioned herein are the exclusive property of their respective owners.
This project is supported by the:
25. MetaCDN Web Service
• Makes all functionality available via Web
Services (SOAP & REST/HTTP)
• Web interface is useful for novices and for
ad-hoc deployments, but doesn’t scale
• Larger customers have 1,000’s - 10,000 -
100,000s of files that need deployment
• Let them automate their deployment and
management via Web Services!
• Perfect for Mashup developers!!!
26. MetaCDN Load
Redirector
• We have created a unified namespace to
simply deployment, routing, management
• Currently, file deployment results in multiple
URLs, each mapping to a replica
• http://metacdn-eu-user.s3.amazonaws.com/myfile.mp4
• http://metacdn-us-user.s3.amazonaws.com/myfile.mp4
• http://node3.nirvanix.com/MetaCDN/user/myfile.mp4
• Single namespace is created for fine control
• http://www.metacdn.org/FileMapper?itemid=2
27. MetaCDN Load
Redirector (cont.)
• Actual load redirection logic depends on deployment
option used by MetaCDN user:
• Maximise coverage and performance? - Find closest
physical replica
• Deploy content in specific locations? - Find closest
physical replica
• Cost optimised deployment? - Find cheapest
replica, minimises cost to maximise lifetime
• Quality of Service (QoS) optimised? - Find best
(historically) performing replica
28. MetaCDN Load
Redirector (cont.)
MetaCDN end-user DNS Server MetaCDN gateway
Resolve www.metacdn.org
Return IP of closest MetaCDN gateway,
www-na.metacdn.org
GET http://metacdn.org/MetaCDN/FileMapper?itemid=1
processRequest ()
geoRedirect ()
HTTP 302 Redirect to
http://metacdn-us-username.s3.amazonaws.com/filename.pdf
Amazon S3 USA
Resolve metacdn-us-username.s3.amazonaws.com
Return IP of metacdn-us-username.s3.amazonaws.com
GET http://metacdn-us-username.s3.amazonaws.com/filename.pdf
Return replica
29. Redirector Overhead
from USA
1.8
S3 USA
MetaCDN
1.6
1.4
1.2
1
Seconds
0.8
0.6
0.4
0.2
0
0 5 10 15 20 25
Hour
30. Redirector Overhead
from Australia
0.9
Nirvanix SDN #3
MetaCDN
0.85
0.8
0.75
Seconds
0.7
0.65
0.6
0.55
0 5 10 15 20 25
Hour
31. Evaluating MetaCDN
performance
• Ran tests over 24 hour period (mid-week)
• Downloading test replicas (1KB, 10MB) 30
times per hour, take average and conf. inter.
• 10MB - Throughput, 1KB - Response Time
• Ran test from 6 locations: Melbourne (AUS),
Paris (FRA), Vienna (AUT), San Diego & New
Jersey (USA), Seoul (KOR)
• Replicas located across US, EU, ASIA, AUS
32. Summary of Results -
Throughput (KB/s)
S3 S3 SDN SDN SDN SDN
Coral
US EU #1 #2 #3 #4
Melbourne
264.3 389.1 30 366.8 408.4 405.5 173.7
Australia
703.1 2116 483.8 2948 416.8 1042 530.2
Paris France
Vienna
490.7 1347 288.4 2271 211 538.7 453.4
Austria
Seoul
312.8 376.1 466.5 411.8 2456 588.2 152
South Korea
San Diego
1234 323.5 5946 380.1 506.1 820.4 338.5
USA
New Jersey
2381 1949 860.8 967.1 572.8 4230 636.4
USA
33. Summary of results -
Response Time (Sec)
S3 S3 SDN SDN SDN SDN
Coral
US EU #1 #2 #3 #4
Melbourne
1.378 1.458 0.663 0.703 1.195 0.816 5.452
Australia
0.533 0.2 0.538 0.099 1.078 0.316 3.11
Paris France
Vienna
0.723 0.442 0.585 0.099 1.088 0.406 3.171
Austria
Seoul
1.135 1.21 0.856 0.896 1 0.848 3.318
South Korea
San Diego
0.232 0.455 0.23 0.361 0.775 0.319 4.655
USA
New Jersey
0.532 0.491 0.621 0.475 1.263 0.516 1.916
USA
34. Summary of results
(cont)
• Clients benefited greatly from local replicas
• Results are consistent in terms of response time
and throughput with previous studies of
dedicated CDNs
• Back-end providers have sufficient performance
and reliability to be used to host replicas in the
MetaCDN system
• Adding “CDN” Cloud Providers (Amazon
CloudFront, Mosso Cloud Files) will likely
significantly improve results.
35. MetaCDN features in
planning / development
• Support as many providers as possible
• Windows Azure Storage Service support will
be finished very shortly.
• Non-“Cloud” storage (i.e. FTP, WebDav) will
be available that is seamlessly integrated.
• Integrated Flash video and MP3 audio streaming
with embeddable players for customers to place
on their origin sites.
36. MetaCDN features in
planning / development
• Autonomic deployment management
(expansion/contraction) based on demand
• Autonomic deployment management
(expansion/contraction) based on QoS
• Security / ACL framework that spans all
cloud storage providers
• “One time” MetaCDN URLs
37. Collaborations
• Always looking for people to collaborate on
the project
• Work to be done on:
• Load balancing / redirection algorithms
• Intelligent Caching / replication algorithms
• Security / Access Control of content
• Improving MetaCDN Web Services
• Please contact me if you are interested...
38. Acknowledgements
• Australian Research Council (ARC) for
funding the project.
• Cory & Barry (Nirvanix) and Eric (Rackspace/
Mosso) for development support.
39. Publications
• J. Broberg and Z. Tari. MetaCDN: Harnessing Storage Clouds for High
Performance Content Delivery. In Proceedings of The Sixth International
Conference on Service-Oriented Computing [Demonstration Paper]
(ICSOC 2008), LNCS 5364, pp. 730–731, 2008.
• J. Broberg, R. Buyya and Z. Tari. Creating a ‘Cloud Storage’ Mashup for High
Performance, Low Cost Content Delivery, Second International Workshop
on Web APIs and Services Mashups (Mashups’08), In Proceedings of The
Sixth International Conference on Service-Oriented Computing
Workshops, LNCS 5472, pp. 178–183, 2009.
• J. Broberg, R. Buyya, and Z. Tari, MetaCDN: Harnessing ‘Storage Clouds’ for
high performance content delivery, Journal of Network and Computer
Applications (JNCA), To appear, 2009
• R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic. Cloud computing
and emerging IT platforms: Vision, hype, and reality for delivering computing
as the 5th utility, Future Generation Computer Systems. To appear, 2009.
40. Thanks
Latest information will be available at:
www.metacdn.org
International Workshop on Cloud Computing
(Cloud 2009):
http://www.gridbus.org/cloud2009
Notes de l'éditeur
MetaCDN is a system that leverages several existing ‘storage clouds’, creating an integrated overlay network that provides a low cost, high performance content delivery network for content creators.
*Content Delivery Networks (CDNs) such as Akamai and Mirror
Image place web server clusters in numerous geographical locations
to improve the responsiveness and locality of the content it hosts for
end-users.
*However, their services are priced out of reach for all but the largest enterprise customers.
Major CDN providers are notoriously cagey about revealing their prices. Most will only reveal their prices if you are serious customer and are willing to commit to a contract and minimum data usage (as detailed in the previous slide). As such Dan Rayburn @ StreamingMedia.com (a blog run for streaming media and CDN professionals) undertakes an informal sampling of pricing (taken from CDN customers) every few months.
Numerous ‘storage cloud’ providers (or ‘Storage as a Service’) have emerged that can
provide data storage and delivery in several continents, offering S.L.A. backed performance and uptime promises for their services.
Each storage provider outlines their own cost structure for data transferred in and out of their service, as well as charging for persistent storage. In each of these cases, the costs are in the order of cents per gigabyte. Pricing scales downward based on higher usage for all providers. There is no minimum data usage requirement and no contracts - you only pay for what you store and transfer.
The providers themselves have very similar core functionality, but there are some key differences, for example, the largest allowable file size, the coverage footprint or specific features.
It is easy to see why storage clouds provide a compelling alternative to traditional CDNs for content producers that transfer significant amounts of data to their customers.
Amazon CloudFront offers a CDN-like service that is significantly cheaper than tradional CDNs. Amazon charges different rates depending on where the data is delivered from to reflect the cost of data transfer and operations in different locations.
1. MetaCDN is more likely to meet the needs of content creators than a single provider could.
2. There is no ‘unified’ or familiar interface for all storage clouds. Consider Amazon S3, Nirvanix SDN, Mosso Cloud Files and Microsoft Azure Storage Service. These four cloud storage providers have four separate access APIs that a developer would need to learn to access these services.
3. If a content creator attempted to utilise these providers themselves, they would essentially need to perform the load balancing and redirection themselves at their origin sites (complex!)
The service is presented to end-users as a web portal for small or ad-hoc deployments or as Web Services (currently under development) for integration of customers with more complex and frequently changing content delivery needs. The web portal was developed using Java Enterprise and Java Server Faces (JSF) technologies, with a MySQL back-end to store user accounts, deployments, and the capabilities and pricing of service providers.
Introduce connectors, and major components.
The MetaCDN system integrates with each storage provider via connectors that provides an abstraction to hide the complexity arising from the differing ways each provider allows access to their systems. The connectors encapsulate basic operations like creation, deletion and renaming of files and folders. If an operation is not supported on a particular service, then the connector for that service should throw a FeatureNotSupportedException.
1. MetaCDN deploys as many replicas as possible to all available locations.
2. A user nominates regions and MetaCDN matches the requested regions
with providers that service those areas.
3. Where MetaCDN deploys as many replicas in the locations requested by the user as their
storage and transfer budget will allow, keeping them active until that budget is exhausted.
4. MetaCDN deploys to providers that match specific QoS targets that a user specifies, such as average throughput or response time from a particular location, which is tracked by persistent probing from the MetaCDN QoS monitor.
The MetaCDN database tracks all pertinent information such as users of the system, credentials for various providers, details about the providers capabilities, pricing and footprint and details of replicas deployed.
Using the web portal, users can sign up for an account on the MetaCDN system, and
enter credentials for any cloud storage or other provider they have an account with. Once this simple step has been performed, they can utilise the MetaCDN system to intelligently deploy content onto storage providers according to their performance requirements and budget limitations.
A MetaCDN user is required to register the credentials of providers they have accounts with. Once this step is done they do not need to worry about how to interact with each of the providers. Eventually we would like MetaCDN users to not require accounts with specific providers - rather MetaCDN would provide consolidated billing of users for storage and transfer of content.
The MetaCDN \"Control Panel\" gives easy access to the core features of the service. You can deploy content, view existing deployments (via a high level content view or a detailed replica view), and view a deployment map overlayed onto Google Maps or Google Earth.
Here we can see an example of geographical-based deployment. A user nominates regions and MetaCDN matches the requested regions with providers that service those areas. The user also specifies the desired lifetime of the deployment, after which the replicas will be removed.
We can view details of our past deployments. We store and track information such as the origin id (i.e. the original source of the content), a unique GUID, the MetaCDN URL that represents the deployment, the number of times this content has been downloaded, the last time this content was downloaded and how many replicas were generated from this deployment.
Here we can see the specific replicas that have been generated from our various deployments. For each replica, we can see which provider and location was utilised, the public URL of the replica, the number of times the replica has been downloaded, the last access time of a specific replica and options to modify, delete, or view the replica if we wish to fine tune our deployment.
We can get a birds eye view of where our replicas are stored, and how many are stored in each location. MetaCDN generates a KML file for each user that is used to overlay on Google Maps (shown here) or we can view our deployments in Google Earth. We expect to overlay more useful information in these views in the near future, such as the cost expenditure at each location and the location of client (i.e. file consumer) hotspots.
A web service interface is under development that will make all the functionality of the web portal available in a programmatic fashion. Obviously it's not feasible to deploy thousands of files manually via the web portal so we need to prove the facility for advanced customers to scale out easily and rapidly.
*With multiple sources (and multiple URL’s) the complexity of load
balancing is imposed on the origin / content provider
*With single namespace we can have coarse and fine-grained control
via DNS redirection and layer4/7 load balancing
http://www.metacdn.org:8080/MetaCDN/FileMapper?itemid=1
http://www.metacdn.org:8080/MetaCDN/FileMapper?itemid=1&policy=RAN
http://www.metacdn.org:8080/MetaCDN/FileMapper?itemid=1&policy=GEO
During the development phase only, only 1 copy of portal/redirector is running in Melbourne, Australia. The plan is to deploy portals in several locations across US, Asia and Europe. We will see from next slide why this is necessary.
Let's assume that a consumer in the USA was accessing a replica directly (i.e. it magically knew the best replica to select), or via MetaCDN. Here we can see that there is around 0.4 seconds of overhead, which is predominantly the round-trip time to access the gateway in Australia.
When a consumer has a gateway that is close to it (in this case, a consumer in Australia is utilising a local gateway) the overhead is significantly smaller, in the order of 0.05 seconds per request. It is obvious that local gateways are needed in key areas to maximise performance.
In the second half of 2008 we evaluated the two major cloud storage providers at the time, Amazon S3 and Nirvanix SDN. We ran the test over 24 hours from a variety of client and replica locations to see whether the providers demonstrated sufficient performance (i.e. throughput and response times) to act as a \"poor man's\" CDN.
In 5 out of 6 client locations there were at least 2 replicas each that delivered throughput that is consistent with what we would expect from a traditional CDN service.
<Kilobytes per second>
In 4 our of 6 client locations there was 1-3 replicas that delivered response time that is consistent with what we would expect from a traditional CDN service. It is worth noting here that these times represent the end to end latency and connection time (i.e. a HTTP connection is made), they are not simply ping measurements.
FTP/WebDav support will be useful in locations where cloud providers do not (and are unlikely to) service.
There is a lot of demand from customers to move away from Youtube and Vimeo flash video hosts and host their own streaming content directly on their origin site. This way they control the look and feel and ad monetization of their content.