An Empirical Performance Study of AppEngine and AppScale
1. An Empirical Performance Study of AppEngine and AppScale
Fei Dong Yunjia Zhou Xuanran Zong
Duke University Duke University Duke University
1 Introduction mance of Google AppEngine and AppScale, in par-
ticular from a request latency perspective. Second,
In recent years, we have witnessed an increasing trend we compare the performance of different databased
on cloud computing usage. The prominence of cloud supported by AppScale [5]. By answering these
computing comes from its elasticity and “pay-as-you- two questions, we want to have a general sense of
go” charging model. On one hand, with cloud com- how good/bad is AppScale compared to Google Ap-
puting, small enterprises do not need to pay any up- pEngine.
front investment on infrastructure and IT staff, sav- The rest of this report will be organized as follows.
ing cost and reducing the risk of over-provisioning; on In section 2, we briefly introduce how AppEngine and
the other hand, cloud providers can multiplex work- AppScale works. In section 4, we present our exper-
loads from many customers and improve the utiliza- iment framework. In section 3 and 5, we articulate
tion of their data centers. how we deploy the system and perform the experi-
ment, as well as some results. In the last section, we
In general, there are three types of cloud comput-
end up with a brief conclusion.
ing model: Software as a Service(SaaS), Platform as
a service(PaaS) and Infrastructure as a Service(IaaS).
Each cloud provider chooses one model to build their 2 Background
cloud infrastructure. For example, Amazon EC2 [1]
build their cloud model as IaaS, i.e. they rent raw 2.1 Google AppEngine
virtual machines to their customers. On the other
end of the spectrum, Google AppEngine [4] and Mi- App Engine allows you to deploy your Web applica-
crosoft Azure are PaaS, because they merely provide tions to Google’s highly scalable infrastructure. Al-
an interface for their customer to host web applica- though the infrastructure is designed to scale, there
tions on Google’s data centers. While there are a are a number of ways to optimize the performance of
couple of public cloud services offered by commercial the application, which results in an improved user ex-
companies, academia is also working hard to offer perience and less resource consumption. App Engine
open source cloud services. For instance, UCSB has includes the following features:
announced their Eucalyptus [3], which fully mimics
Amazon EC2, and AppScale [6], which fully mimics • dynamic web serving, with full support for com-
Google Appengine. Therefore, any party who owns mon web technologies
a cluster can become a public cloud provider by de-
ploying either Eucalyptus or AppScale on the cluster. • persistent storage with queries, sorting and
transactions
However, there is a caveat here: although Euca-
lytpus and AppScale can mimic the functionalities, • automatic scaling and load balancing
do they also provide the same performance? In this
project, we tried to answer one small facet of this • APIs for authenticating users and sending email
problem. First, we attempt to compare the perfor- using Google Accounts
2. • a fully featured local development environment ease the deployment. The users only need to build the
that simulates Google App Engine on your com- image from source and deploy the image. Though it
puter sounds trivial at the first glance, we actually confront
a lot of issues when we deploy it on our department’s
• task queues for performing work outside of the Eucalyptus cluster. We will further elaborate this in
scope of a web request section .
• scheduled tasks for triggering events at specified Once we have deployed AppScale, we can upload
times and regular intervals GAE applications to the system. Each application
has three major components provided by AppScale to
The applications can run in one of two runtime en- serve the request. They are AppServer, data storage
vironments: the Java environment, and the Python and AppLoadBalancer(ALB), which are very similar
environment. Each environment provides standard to the components involved in a 3-tier web system.
protocols and common technologies for web applica- The ALB acts like a HTTP server and load balancer.
tion development. Each app is allocated resources When it receives request, it redirect the request to
within limits. one AppServer which hosts the application and initi-
With App Engine, Google takes care of everything ate the connection. The AppServer is similar to the
for you. The App Engine datastore provides distri- application server in typical web system that it hosts
bution, replication, and load-balancing services be- the application and do servelet processing. Lastly,
hind the scenes, freeing you up to focus on imple- the data storage stores all the persistent data. App-
menting your business logic. App Engine’s data- Scale provides a great flexibility for users to choose
store is powered mainly by two Google services: appropriate back-end database, including HBase, Hy-
Bigtable and Google File System (GFS). Bigtable is pertable, Mysql, Cassandra, Voldemort, MongoDB,
a highly distributed and scalable service for storing MemcacheDB, Scalaris. More importantly, they all
and managing structured data. Bigtable utilizes a shares the same API provided by JDO and users can
non-relationship object model to store entities, allow- run application seamlessly on different data storage.
ing you to create simple, fast, and scalable applica- In other words, users do not need to make any mod-
tions. The datastore also uses GFS to store data and ification to the code in order to accommodate a new
log files. GFS is a scalable, faulttolerant file system back end storage.
designed for large, distributed, data-intensive appli-
cations. App Engine uses the Java Persistence API
(JPA)) and Java Data Objects (JDO) interfaces for 3 Deployment
modeling and persisting entities.
At the beginning, we attempted to build AppScale
image from source and launched AppScale instances
2.2 AppScale
on our local Eucalyptus cluster. We first used App-
AppScale was developed by UCSB in order to mimic Scale tool to create AppScale image emi-9F410FB6
the PaaS cloud model. It offers almost the same on dbc1-03 and ran appscale-run-instances to
application interface as Google AppEngine. This launch the appscale instance. The command can suc-
includes same application structure, same back-end cessfully launch an instance on the Eucalyptus cluster
data storage API (they all use JDO), and very sim- but will be blocked at ’wait for your instance to com-
ilar application deployment routine. Therefore, peo- plete the bootup process’. Figure 1 shows the App-
ple can switch their application from one to the other Scale log when we execute appscale-run-instances.
without any modification. From the log we can observe that AppScale have
AppScale can be deployed on three platforms: launched two instances running AppScale image on
KVM enabled cluster, Eucalyptus, and EC2. App- the Eucalyptus cluster and we can even log in on
Scale provides a set of command written in Ruby to those instances. Yet, it was blocked by some known
3. $appscale-run-instances --min 1 --max 1 --file sample_apps/guestbook/
--machine emi-9F410FB6 --table memcachedb --infrastructure euca --keyname
dongfei --instance_type c1.xlarge -v --force
About to start AppScale over a cloud environment with the euca tools with
instance type c1.xlarge.
----------- repeat the time ----
Reported Public IPs: [192.168.1.35]
Reported Private IPs: [192.168.1.35]
Please wait for your instance to complete the bootup process.
New secret key is 1Zu22syYs2jKhs2nTpuKhm2bY2nJuV ft
"machine"=>"emi-9F410FB6", "keyname"=>"myapp", "ips"=>"",
"replication"=>"1", "instance_type"=>"c1.xlarge",
"ec2_access_key"=>"DK4LXEFhkcYf8vNztq0FhKXEF5mpW15vAinYfw",
"infrastructure"=>"euca", "table"=>"memcachedb", "min_images"=>"1",
"ec2_secret_key"=>"cfKFBU35lbdAg8soawO9NcumXQgqqUKh9aOSg", "appengine"=>"3",
"ec2_url"=>"http://152.3.144.15:8773/services/Eucalyptus",
"keypath"=>"myapp.key", "hostname"=>"192.168.1.35", "max_images"=>"1"
Head node successfully created at 192.168.1.35.
It is now starting up memcachedb via the command line arguments given.
Generating certificate and private key Copying over credentials for cloud
Starting server at 192.168.1.35 Please wait for the controller to finish
pre-processing tasks.
This AppScale instance is linked to an e-mail address giving it administrator
privileges.
Enter your desired administrator e-mail address: [dongfei@xxx.com]
The new administrator password must be at least six characters long and can
include non-alphanumeric characters.
Enter your new password: [xxxxxx]
Enter again to verify: [xxxxxx]
Please wait for AppScale to prepare your machines for use.
[Blocked here]
dongfei@dbc1-03:~$ euca-describe-instances |grep emi-9F410FB6
INSTANCE i-47D0099E emi-9F410FB6 192.168.1.34 192.168.1.34
running cps212 0 c1.xlarge 2010-12-01T04:15:22.934Z
dukecs-pod1 eki-0AC4191A eri-59AE1A00
Figure 1: AppScale log during instance launching
4. post preparation work. This is not the correct status
as described in [2]. We changed other configuration
parameters and it still did not work either. We read
most of the online documents of AppScale and did
not find any clue.
Someone from AppScale mailing suggested us to
run appscale-run-instances on master node. He
said ’can you try using the AppScale tools that are
installed on the master node (and if it’s not there Figure 2: Sample benchmark.py output
at /usr/local/appscale-tools, can you install it)? Try
a run-instances on the master node and I think the
networking problems should be ok’. So, we ran the thread is going to pull the job from job queue, exe-
command there but found AppScale was still blocked. cutes the job and measures the completion time. In
Robin Hood told us ’It seems there are some thing our experiment, we define job as sending a single re-
with your database. The instance need at lease 1GB quest to our web application. After we finish sending
memory to start database daemon’. all the requests, the script prints out some statistics,
Given the limited amount of time left, we decided such as mean request latency. Figure 2 shows a sam-
to give up the Eucalyptus cluster and switch to Ama- ple output produced by running benchmark.py. Note
zon EC2. Since EC2 already has a default AppScale that this script is highly generic and extensible. You
image, we do not have to build our own image. App- can define you own customized job behavior and use
Scale runs smoothly on EC2 for all the configuration the same script to conduct experiment.
parameters we have tried (different databases, dif- We then leverage this test script to send concur-
ferent number of instances, etc.). We also attach a rent requests to server in order to saturate the re-
log for a successful launch of AppScale image in Ap- quired workload. Note that we cannot use a sin-
pendix. gle machine to send requests because that is equiv-
alent to using a single commodity desktop to satu-
rate a commercial high-performance server. Hence,
4 Experiment Framework we employ our department cluster as worker slaves
to help us sending concurrent requests and our lo-
We have two goals in this project: cal machine acts as a master to coordination all the
experiment done on each worker. We use all the
• to compare the performance between Google Ap-
cluster machines (i.e. from linux1.cs.duke.edu
pEngine and AppScale
to linux30.cs.duke.edu. Each worker runs
• to compare the performance among various benchmark.py with concurrency 10. In total, we have
databases provided by AppScale 10 ∗ 30 outstanding concurrent requests to saturate
the server workload. In addition, when we run multi-
There are all kinds of performance metrics, such ple threads on one worker, we notice a certain amount
as throughput, goodput and latency, etc. We se- of overhead depending on the concurrency size. In
lected request latency because this is the one which order to reconcile this overhead, we always fully uti-
the cloud users most care about. lize all the threads of one machine before we use the
We developed an automatic experiment framework next machine. In this case, we make thread overhead
to conduct a series of experiments. We first wrote our as a constant. In summary, our framework works as
core test script benchmark.py, which takes configure follows:
parameters such as number of requests, concurrency,
etc. The scripts is implemented using a thread pool, 1 The master copies all the experiment related
the size of which depends on the concurrency. Each scripts and configuration files to each worker.
5. 2 The master then issues a command to each 30000
Google Appengine
worker simultaneously to invoke benchmark.py Appscale on EC2/one m.large instance
Appscale on EC2/two m.large instance
25000 Appscale on EC2/four m.large instance
3 In the end, all the worker returns the experiment
Mean request latency (ms)
results back to the master which is then in charge 20000
of processing the final data.
15000
4.1 Request Workload 10000
We further notice that even though we can leverage 5000
thirty workers to send concurrent requests, we still
cannot saturate the server. The reason is that the 0
0 20 40 60 80 100 120 140 160 180 200
application we have chosen is too light weighted and # of concurrent requests
does not impose too much computation or storage
operations. Therefore, we hack into the application Figure 3: Performance of AppEngine and AppScale
to modify the workload. In our experiment, each re-
quest invoke 1000000 floating point computation and
Besides, the AppEngine scales quite well. As the
ten SQL(select) queries. We claim this can capture
number of concurrent request increased, the perfor-
both the computation and data processing perfor-
mance of AppEngine was pretty stable. The App-
mance offered by the cloud services.
Scale, however, not so scalable as AppEngine. The
We also attempted to measure the performance of
latency increased dramatically when we augment the
different databases provided by AppScale. This in-
number of concurrent requests.
volves comparing read/write/delete operations sep-
Considering the dominant factor that may affects
arately. Hence, we also write our write and delete
the scalability, we also evaluate the scalability of
workload. The write workload involves 1000 data in-
AppScale on different number of EC2s instances. As
sertion SQL queries (write operations). The delete
what we can see from figure 3, the performance of 4
workload involves clean all the existing data, which
nodes is much better than the 1 node. So, we can
is typically just the 1000 data we previously inserted,
get a tentative conclusion that, when more instances
in the system.
added into the master-slave model, the AppScale can
handle more concurrent requests.
5 Evaluation It looks like the AppEngine outperformed the App-
Scale. However, we still hesitate to claim the conclu-
5.1 AppEngine vs AppScale sion, since we have no idea the number of instances
run in Google AppEngine. They may have a lot of
In this experiment, we want to evaluate the scal- machines to respond our concurrent requests, which
ability of AppEngine and AppScale with different makes them achieve the high scalability. In other
amount of node. words, if we run enough instances on AppScale, the
First, we deployed the our modified application as performance would be stable.
mentioned in section 4.1 on both Google AppEngine
and AppScale on Amazon EC2 with 1, 2, 4 instances
5.2 Data Storage Performance
respectively. Then, we use our framework to saturate
the server. We send concurrent requests from 10 200 In the experiment, we noticed that the Appscale of-
and measured the mean request latency. Figure 3 fers seven different database interfaces for the client
shows the performance of each case. to choose. So we decide to evaluate the perfor-
As we can see, the mean request latency of Google mance of different type of databases when heavy
AppEngine is pretty low comparing to the AppScale. write, heavy-read and delete respectively.
6. (a) Write operation (b) Read operation (c) Delete operation
Figure 5: Read/Write/Delete operations on two instances
work on a single node. From the graph, we notice
that there are significant performance difference be-
tween different databases, e.g. Voldemort and Mon-
goDB is much worse than MemcacheDB and Cassan-
dra. Therefore, we need to be more wise when we
choose which database to use.
Figure 5 show the comparison of performance of
different databases on two instances for the write,
read and delete operation respectively. As we can
see, the MemcacheDB did quite well, and the latency
of Voldemort database is quite long.
However, we cannot assert that the Voldemort
database is slow and not useful. We refer to some
resources to figure out why it performed not so well.
The reason maybe is that it is a distributed database,
and not a relational database, which makes sense that
Figure 4: Read operations on single instances
it is slow.
For write, we use 10 threads to concurrently ap- 6 Conclusion
pend 500 Bytes data entry to the back-end data stor-
age and we are inserting 1000 entries in total. ForIn this project, we did an empirical performance
read, we made the server to do 1M floating-point study of AppEngine and AppScale. We are interested
computation and 10 data queries for a single request.
in these two platforms because from the program-
In total, we send 100 read requests. For delete, weming interface perspective, AppScale highly mimic
just remove all the existing data. AppEngine API. We want to reveal how much perfor-
Figure 4 shows the performance of four databases mance difference do they have. From our experiment
out of seven options on a single instance. We only results, AppScale has much better performance, both
show four because the rest three databases must be in terms of request latency and scalability. Moreover,
deployed on the master-slave model, which cannot we also conducted experiments to compare different
7. data storage services provided by AppScale. Our re- A Appendix I: Log from a suc-
sults elaborate that different storage services actually
render dramatically different performance. Hence, we
cessful instance launch
need to be more wise to choose one to use when we Please refer to figure 6.
host our application on AppScale. In future, besides
the performance comparison, we may also explore the
cost of AppEngine and AppScale and investigate if
we can sacrifice a little performance by saving more
cost.
References
[1] Amazon Elastic Compute Cloud. http://aws.amazon.
com/ec2.
[2] AppScale Documentation. http://www.google.com/
url?q=http%3A%2F%2Fcode.google.com%2Fp%2Fappscale%
2Fwiki%2FDeploying_AppScale_1_4_via_Eucalyptus%
23Running_a_Sample_Application.
[3] Eucalyptus. http://open.eucalyptus.com.
[4] Google AppEngine. http://appengine.google.com.
[5] Bunch, C., Chohan, N., Krintz, C., Chohan, J.,
Kupferman, J., Lakhina, P., Li, Y., Nomura, Y.,
Bunch, C., Kupferman, J., and Krintz, C. An eval-
uation of distributed datastores using the appscale cloud
platform.
[6] Chohan, N., Bunch, C., Pang, S., Krintz, C., Mostafa,
N., Soman, S., and Wolski, R. Appscale design and im-
plementation, 2009.
8. /usr/local/appscale-tools/bin/appscale-run-instances --min 4 --max 4 --file
/home/dongfei/cps212/cps212project/guestbook --machine ami-044fa56d --table
memcachedb --infrastructure ec2 --instance_type m1.large --keyname
cps212_2460_0 --force
About to start AppScale over a cloud environment with the ec2 tools with
instance type m1.large.
Run instances message sent successfully. Waiting for the image to start up.
[Sun Dec 12 20:35:56 -0500 2010] 1799.999987 seconds left until timeout...
[Sun Dec 12 20:36:19 -0500 2010] 1777.262273 seconds left until timeout...
/* time remaining log omitted */
[Sun Dec 12 20:40:50 -0500 2010] 1506.176784 seconds left until timeout...
[Sun Dec 12 20:41:13 -0500 2010] 1483.592267 seconds left until timeout...
[Sun Dec 12 20:41:36 -0500 2010] 1460.731472 seconds left until timeout...
Please wait for your instance to complete the bootup process.
Head node successfully created at ec2-67-202-41-213.compute-1.amazonaws.com.
It is now starting up memcachedb via the command line arguments given.
Generating certificate and private key
Copying over credentials for cloud
Starting server at ec2-67-202-41-213.compute-1.amazonaws.com
Please wait for the controller to finish pre-processing tasks.
This AppScale instance is linked to an e-mail address giving it administrator
privileges.
Enter your desired administrator e-mail address:
The new administrator password must be at least six characters long and can
include non-alphanumeric characters.
Enter your new password:
Enter again to verify:
Please wait for AppScale to prepare your machines for use.
AppController just started
Spawning up 1 virtual machines
Copying over needed files and starting the AppController on the other VMs
Setting up database configuration files
Starting up Load Balancer
Your user account has been created successfully.
Uploading cps212guestbook...
We have reserved the name cps212guestbook for your application.
cps212guestbook was uploaded successfully.
Please wait for your app to start up.
Your app can be reached at the following URL:
http://ec2-67-202-41-213.compute-1.amazonaws.com/apps/cps212guestbook
The status of your AppScale instance is at the following URL:
http://ec2-67-202-41-213.compute-1.amazonaws.com/status
Figure 6: A successful instance launch log