SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
A Cloud Gaming System Based on User-Level
Virtualization and Its Resource Scheduling
Youhui Zhang, Member, IEEE, Peng Qu, Jiang Cihang, and Weimin Zheng, Member, IEEE
Abstract—Many believe the future of gaming lies in the cloud, namely Cloud Gaming, which renders an interactive gaming application
in the cloud and streams the scenes as a video sequence to the player over Internet. This paper proposes GCloud, a GPU/CPU hybrid
cluster for cloud gaming based on the user-level virtualization technology. Specially, we present a performance model to analyze the
server-capacity and games’ resource-consumptions, which categorizes games into two types: CPU-critical and memory-io-critical.
Consequently, several scheduling strategies have been proposed to improve the resource-utilization and compared with others.
Simulation tests show that both of the First-Fit-like and the Best-Fit-like strategies outperform the other(s); especially they are near
optimal in the batch processing mode. Other test results indicate that GCloud is efficient: An off-the-shelf PC can support five high-end
video-games run at the same time. In addition, the average per-frame processing delay is 8$19 ms under different image-resolutions,
which outperforms other similar solutions.
Index Terms—Cloud computing, cloud gaming, resource scheduling, user-level virtualization
Ç
1 INTRODUCTION
CLOUD gaming provides game-on-demand services over
the Internet. This model has several advantages [1]: it
allows easy access to games without owning a game console
or high-end graphics processing units (GPUs); the game dis-
tribution and maintenance become much easier.
For cloud gaming, the response latency is the most essen-
tial factor of the quality of gamers’ experience “on the
cloud”. The number of games that can run on one machine
simultaneously is another important issue, which makes
this mode economical and then really practical. Thus, to
optimize cloud gaming experiences, CPU / GPU hybrid
systems are usually employed because CPU-only solutions
are not efficient for graphics rendering.
One of the industrial pioneers of cloud gaming, Onlive1
emphasized the former: it allocated one GPU per instance for
high-end video games. To improve utilization, some other
service-providers use the virtual machine (VM) technology
to share the GPU among games running on top of VMs. For
example, GaiKai2
and G-cluster3
stream games from cloud-
servers located around the world to internet-connected devi-
ces. Since the end of 2013, Amazon EC2 has also provided
the service for streaming games based on VMs.4
More technical details can be acquired from non-
commercial projects. GamePipe [2] is a VM-based cloud
cluster of CPU/GPU servers. Its characteristic lies in that,
not only cloud resources but also the local resources of
clients can be employed to improve the gaming quality.
Another system, GamingAnywhere [3], has used the user-
level virtualization technology. Compared with some solu-
tions, its processing delay is lower.
Besides, task scheduling is regarded as another key issue
to improve the utilization of resources, which has been veri-
fied in the high-performance GPU-computing fields [4], [5],
[6], [7]. However, to the best of our knowledge, the schedul-
ing research for cloud gaming has not received much
attention yet. One example based on VMs is VGRIS [8]
(including its successor VGASA [9]. It is a GPU-resource
management framework in the host OS and schedules vir-
tualized resource of guest OSes.
This paper proposes the design of a GPU/CPU hybrid sys-
tem for cloud gaming and its prototype, GCloud. GCloud has
used the user-level virtualization technology to implement a
sandbox for different types of games, which can isolate more
than one game-instance from each other on a game-server,
transparently capture the game’s video/audio outputs for
streaming, and handle the remote client-device’s inputs.
Moreover, a performance model has been presented;
thus we have analyzed resource-consumptions of games
and performance bottleneck(s) of a server, through exces-
sive experiments using a variety of hardware performance-
counters. Accordingly, several task-scheduling strategies
have been designed to improve the server utilization and
been evaluated respectively.
Different from related researches, we focus on the guide-
line of task-assignment, that is, on the reception of a game-
launch request, we should judge if a server is suitable to
undertake the new instance or not, under the condition sat-
isfying the performance requirements.
In addition, from the aspect of user-level virtualization
(there is some existing user-level solution, like Gaming-
Anywhere [3]), GCloud has its own characteristics:
1. http://www.onlive.com/
2. https://www.gaikai.com/
3. http://www.g-cluster.com/eng/
4. https://aws.amazon.com/game-hosting/
 The authors are with the Department of Computer Science and Technology,
Tsinghua University, Beijing, China. E-mail: {zyh02, zwm-dcs}@tsinghua.
edu.cn, shen_yhx@163.com, famousjch@qq.com.
Manuscript received 13 Nov. 2014; revised 11 May 2015; accepted 11 May
2015. Date of publication 14 May 2015; date of current version 13 Apr. 2016.
Recommended for acceptance by Y. Wang.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TPDS.2015.2433916
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016 1239
1045-9219 ß 2015 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution
requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
www.redpel.com +917620593389
www.redpel.com +917620593389
First, it implements a virtual input-layer for each of con-
currently-running instances, rather than a system-wide one,
which can support more than one Direct-3D games at the
same time. Second, it designs a virtual storage layer to trans-
parently store each client’s configurations across all servers,
which has not been mentioned by related projects.
In summary, the following contributions have been
accomplished:
1) Enabling-technologies based on the light-weight virtu-
alization are introduced, especially those of GCloud ‘s
characteristics. (Section 3)
2) To balance the gaming-responsiveness and costs, we
adopt a “just good enough” principle to fix the FPS
(frame per second) of games to an acceptable level.
Under this principle, a performance model is con-
structed to analyze resource consumptions of games,
which categorizes games into two types: CPU-critical
and memory-io-critical; thus several scheduling mech-
anisms have been presented to improve the utiliza-
tion and compared. In addition, different from
previous jobs focused on the GPU-resource, our
work has found the host CPU or the memory bus is
the system bottleneck when several games are run-
ning simultaneously. (Section 4)
3) Such a cloud-gaming cluster has been constructed,
which supports the mainstream game-types. Results
of tests show that GCloud is highly efficient: An off-
the-shelf PC can support up to five concurrently-run-
ning video-games (each game’s image-resolution is
1024 Â 768 and the frame per second is 30). The aver-
age per-frame processing delay is 8$19 ms under
different image-resolutions, which can satisfy the
stringent delay requirement of highly-interactive
games. Tests have also verified the effects of our per-
formance model. (Section 5)
The remainder of this paper is organized as follows.
Section 2 presents the background knowledge of cloud gam-
ing as well as related work. Sections 3 and 4 are the main
part: the former introduces the user-level virtualization
framework and enabling technologies; the performance
model and its analysis method are given in the latter, as well
as the scheduling strategies. Section 5 presents the prototype
cluster and evaluates its performance. Section 6 concludes.
2 RELATED WORK
2.1 Cloud Gaming
Cloud gaming is a type of online gaming that allows
direct and on-demand streaming of game-scenes to
networked-devices, in which the actual game is running on
the server-end (main steps have been described in Fig. 1).
Moreover, to ensure the interactivity, all of these serial oper-
ations must happen in the order of milliseconds, which
challenges the system design critically.
The amount of latencies is defined as interaction delay.
Existing researches [10] have shown that different types of
games put forward different requirements.
One solution type of cloud-gaming is VM-based. For the
solutions based on VMs, Step 1 is completed in the guest OS
while others on the server-end are accomplish by the host.
Barboza et al. [11] presents such a solution, which provides
cloud gaming services and uses three levels of managers for
the cloud, hosts and clients. Some existing work, like GaiKai,
G-cluster, Amazon EC2 for streaming games and GamePipe
[2], also belong to this category.
In contrast to VM-based solutions, the user-level solution
inserts the virtualization layer between applications and the
run-time environment. This mode simplifies the processing
stack; thus it can reduce the extra overhead. GamingAny-
where [3] is such a user-level implementation, which sup-
ports Direct3D/SDL games on Windows and SDL games on
Linux.
Some solutions have enhanced the thin-client protocol to
support interactive gaming applications. Dependent on the
concrete implementation, they can be classified into the two
types. For example, Winter et al. [12] have enhanced the
thin-client server driver to integrate a real-time desktop
streamer to stream the graphical output of applications after
GPU processing, which can be regarded as a light-weight
virtualization-based solution. In contrast, Muse [13] uses
VMs to isolate and share GPU resources on the cloud-end,
which has enhanced the remote frame buffer (RFB) protocol
to compress the frame-buffer contents of server-side VMs.
However, these researches have focused on the optimiza-
tion of interaction delay, namely, taken care of the perfor-
mance of a single game on the cloud, rather than the
interference between concurrently-running instances. More-
over, none of these systems has presented any specific
scheduling strategy.
2.2 Resource Scheduling
For high performance computing (HPC), GPU virtualization
has been widely researched [14], [15], [16] for general pur-
pose computing. From the scheduling viewpoint, there are
also several researches, including Phull et al. [4], Ravi et al.
[5], Elliott and Anderson [6], [7] L. Chen et al. [7] and Bautin
et al. [17].
Fig. 1. The whole workflow of cloud-gaming.
1240 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
However, none of these researches has considered the
cloud gaming characteristics, including the critical demand
on processing latencies, highly-coupled sequential opera-
tions and so on.
The work on the scheduling for cloud gaming is limited:
VGRIS [8] and its successor VGASA [9] are resource man-
agement frameworks for VM-based GPU resources, which
have implemented several scheduling algorithms for differ-
ent objectives. However, they are focused on scheduling
rendering tasks on a GPU, without considering other tasks
like image-capture / -encoding, etc. iCloudAccess [18] has
proposed an online control algorithm to perform gaming-
request dispatching and VM-server provisioning to reduce
latencies of the cloud gaming platform. A recent work is
[19], which has studied the optimized placement of cloud-
gaming-enabled VMs. The proposed heuristic algorithms
are efficient and nearly optimal. Ours can be regarded as
complementary to these researches, because they are
focused on the VM-granularity dispatching / provisioning
while we pay attention to issues inside an OS.
One related work on GPU-scheduling (but not cloud-gam-
ing-specific) is TimeGraph [20]: it is a real-time GPU scheduler
that has modified the device-driver for protecting important
GPU workloads from performance interference. Similarly, it
has not considered the cloud gaming characteristics.
Another category of related researches [21], [22] is con-
cerning the streaming media applications. For example,
Cherkasova and Staley [21] developed a workload-aware per-
formance model for video-on-demand (VOD) applications,
which is helpful to measure the capacity of a streaming sever
as well as the resource requirements. We have referred their
design principles to construct our performance model.
2.3 Others
To improve the processing efficiency and adaptation, Wang
and Dey [23] propose a rendering adaptation technique to
adapt the game rendering parameters to satisfy Cloud
Mobile Gaming’s constraints. Klionsky [24] has presented
an architecture which amortizes the cost of across-user ren-
dering. However, these two technologies are not transpar-
ent to games.
In addition, Jurgelionis et al. [25] explored the impact of
networking on gaming; Ojala and Tyrvainen [26] developed
a business model behind a cloud gaming company.
As a summary, compared with the above-mentioned
work, GCloud has the following features:
1) It is based on the user-level virtualization. Compared
with some existing user-level solution, GCloud has
proposed more thorough solutions for the virtual
input / storage.
2) From the aspect of performance modeling and sched-
uling, more real jobs (including image-capture, encod-
ing, etc.) have been considered (compared with
VGRIS / VGASA [8], [9]). In addition, we use the hard-
ware-assistant video encoding to mitigate the infer-
ence between games and to improve the performance.
3) Last but not least, our work is focused on related
issues inside a node, while [18], [19] do work on the
VM-granularity.
4) Furthermore, quite a few researches have been car-
ried out to measure the performance of cloud gam-
ing systems, like [27], [28], [29] and [30]. We also
referred them to complete our measurements.
3 SYSTEM ARCHITECTURE AND ENABLING
TECHNOLOGIES
3.1 The Framework
The system (in Fig. 2) is built with a cluster of CPU / GPU-
hybrid computing servers; a dedicated storage server is
used as the shared storage. Each computing server can host
the execution of several games simultaneously. One of these
servers is employed as the manager-node, which collects
real-time running information of all servers and completes
management tasks, including the task-assignment, user
authentication, etc.
It is necessary to note that the framework in Fig. 2 is for
small / medium system-scales. For a large scale system
with many users, a hierarchical architecture is needed to
avoid the bottleneck of information-exchange. In fact,
because the quality of gamers’ experience highly depends
on the response latency and the latter is sensitive to the
physical distance between clients and servers, the architec-
ture may be geographically-distributed, which is out of
scape of this paper. It also means that in one site the scale
will not be very large.5
Initially, gaming-agents on available computing servers
register to the manager, indicating that they are ready and
Fig. 2. System architecture.
5. According to OnLive, the theoretical upper bound of the distance
between a user and a cloud gaming server is approximately 1,000 miles.
In China, some gaming systems provide services for just one city or sev-
eral cities.
ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1241
www.redpel.com +917620593389
www.redpel.com +917620593389
which games they can execute. When a client wants to play
some game, the manager will search for candidates among
the registered information. After such a server has been cho-
sen, a start-up command will be sent to the corresponding
agent to boot up the game within a light-weight virtualiza-
tion environment. Then, its address will be sent to the client.
Future communication will be done directly between the
two ends.
During the run time, each agent collects local runtime
information and sends it to the manager periodically; the
latter can get the latest status of resource-consumptions.
The storage server is an important role to provide the
personalized game-configuration for each user. For
instance, User A had played Game B on Server C. Now A
wants to play the game again while the manager finds that
Server C’s resources have been depleted. Then the task has
to be assigned to another server, D. Consequently, it is nec-
essary to restore A’s configurations of B on D, including the
game’s progress and other customized information. The
storage server is just used as the shared storage for all com-
puting nodes.
3.2 The User-Level Virtualization Environment
For each game, API Interception is employed to implement
a lightweight virtualization environment. API interception
means to intercept calls from the application to the underly-
ing running system. The typical applications include soft-
ware streaming [31], [32], etc. Here it is used to catch the
corresponding resource-access APIs from the game. In addi-
tion, our main target platform is MS Windows as Windows
dominates the PC video-game world.
3.2.1 Image Capture
Usually, gaming applications employ the mainstream 3D
computer-graphics-rendering libraries, like Direct3D or
OpenGL, to complete the hardware (GPU) acceleration;
GCloud supports both of them.
In the case of Direct3D, the typical workflow of a game is
usually an endless loop: First, some CPU computation pre-
pares the data for the GPU, e.g., calculating objects in the
upcoming frame. Then, the data is uploaded to the GPU
buffer and the GPU performs the computation, e.g., render-
ing, using its buffer contents and fills the front buffer. To
fetch contents of the image into the system memory for the
consequent processing, we intercept the Direct3D’s Present
API.
For OpenGL, we have intercepted the Present-like API in
OpenGL, glutSwapBuffers, to capture images.
For other games based on the common GUI window, we
just set a timer for the application’s main window, then we
intercept the right message handler to capture the image of
the target window periodically.
3.2.2 Audio Capture
Capturing of audio data is a platform-dependent task.
Because our main target platform is MS Windows, we inter-
cept Windows Audio Session APIs to capture the sound.
Core Audio serves as the foundation of quite a few higher-
level APIs; thus this method can bring about the best
adaptability.
3.2.3 Virtual Input Layer
Flash-based or OpenGL-based applications are usually
using the window’s default message-loop to handle inputs.
Thus, the solution is straightforward: We inject a dedicated
input-thread into the intercepted game-process. On recep-
tion of any control command from the client, this thread
will convert it into a local input message and send it to the
target window.
For Direct3D-based games, the situation is more compli-
cated. The existing work [3] replays input events using the
SendInput API on Windows. However, SendInput inserts
events into a system-wide queue, rather than the queue of a
specific process. So, it is difficult to support more than one
instance for the non-VM solution. To conquer this problem,
we intercepted quite a few DirectInput APIs to simulate
input-queues for any virtualized application; thus the user’s
input can be pushed into these queues and made accessible
to applications.
3.2.4 Virtual Storage Layer
From the storage aspect, a program can be divided into
three parts [31]: Part 12 include all resources provided by
the OS and those created/modified by the installation pro-
cess; Part 3 is the data created/modified/deleted during the
run time, which contains game-configurations of each user.
For the immutable parts, it is relatively easy to distribute
them to servers through some system clone method. The
focus is how to migrate resources of Part 3 across servers to
provide personalized game-configurations for users.
We construct a virtual storage layer by the interception of
file-system and registry accessing APIs of all games. During
the run time, the resource modified by the game instance
will be moved into Part 3. When the previously-described
case in Section 3.1 occurs, the virtual storage layer of Game
B on the current server can redirect resource-accesses to the
shared storage to visit the latest configurations of User A,
which were stored by the last run on Server C.
4 PERFORMANCE MODEL AND TASK SCHEDULING
As mentioned in Section 1, the response latency and the
number of games that one machine can execute simulta-
neously are both essential to a cloud gaming system. To a
large extent, they are in contradiction and existing systems
(like [3], [11], [12]) usually focus on the first issue.
However, it is not always economical. For example, if the
FPS of a given game is too high, it will consume more
resources. Moreover, the loss compression will counteract
the high video-quality to a certain extent.
Some scheduling work, like VGRIS / VGASA [8], [9], has
presented multi-task scheduling strategies. There are several
essential differences between our work and VGRIS / VGASA:
First, they are focused on how to schedule existing games on a
server, including the allocation of enough GPU resources for
a game, etc. In contrast, GCloud is focused on the assignment
of a new task. Second, they are focused on the GPU resource
and no any other operation (like image-capture, encoding,
etc.) has been considered, while our tests (presented in
Section 4.4) show the host CPU or the memory bus is the
bottleneck. Third, VGRIS and VGASA are VM-specific.
1242 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
In this paper, we adopt a “just good enough” strategy,
which means that we just keep the game quality at some
acceptable level and then we try to satisfy the interactivity
requests of as many games as possible. Hence, there are two
main issues:
Issue 1: For a given server and its running game instances,
how to make sure the game quality is acceptable?
Issue 2: On an incoming request, which server is suitable to
launch the new game instance?
For Issue 1, we first give a brief pipeline model for cloud-
gaming, which can be used to judge whether the game qual-
ity is acceptable or not. Second, a method to fix the FPS has
been presented to provide the “just good enough” quality;
some hardware-assistant video encoding technique has also
been used to mitigate the inference between games further.
For Issue 2, several resource-metrics have been given. Then
we carry out tests to measure the server capacity and to cat-
egorize games into different types. Accordingly, we design
a server capacity model and corresponding task-assignment
strategies. These strategies have been compared with others.
4.1 Game Quality
A cloud gaming system’s interaction delay contains three
parts [27]: (1) Network delay, the time required for a round
of data exchange between the server and client; (2) Play-out
delay, the time required for the client to handle the received
for playback; (3) Processing delay, required for the server to
process a player’s command, and to encode and send the
corresponding frame back.
This paper is mainly about the server-side and the net-
work is assumed to be able to provide the sufficient band-
width, thus we focus on the processing delay that should be
confined into a limited range. The work [25] on measuring
the latency of cloud gaming has disclaimed that, for some
existing service-providers (like Onlive), the processing delay
is about 100-200ms. Thus, we use 100 ms as our scheduling
target, denoted MAX_PD. Another measurement of key
metrics is FPS; the required FPS is illustrated as FIXED_FPS.
In this work, FIXED_FPS is set to 30 by default.
As presented by Fig. 1, the gaming workflow can be
regarded as a pipeline including four steps: operations of
gaming logic, graphic rendering (including the image cap-
ture), encoding (including the color-space conversion) and
transmission. In addition, our tests show that given the suf-
ficient bandwidth, the delay of transmission is much less
than other steps. Thus, the fourth step can be skipped and
we focus on the remaining three.
Furthermore, the first two steps are completed by the
intercepted process, which is transparent to us; thus we
should combine them together and the sum of these laten-
cies is denoted by Tpresent. The average processing time
of the encoding step is denoted by Tencoding (The pipeline is
presented in Fig. 3). Hence, if the following conditions
(referred as Responsiveness Conditions) have been satisfied,
the requirement on the FPS and processing delay will be
met undoubtedly. To be more precise, satisfaction of the
first two conditions means the establishment of the last one,
under the default case.
Tpresent  ¼ 1=FIXED FPS and (1)
Tencoding  ¼ 1=FIXED FPS and (2)
Tencoding þ Tpresent  ¼ MAX PD (3)
4.2 Fixed FPS
To provide the “just good enough” gaming quality, the FPS
value should be fixed to some acceptable level (Issue 1).
Because the interface of GPU drivers is not open, our solu-
tion is in the user-space, too.
Take the Direct3D game as an example, we intercept the
Present API to insert a Sleep call for adjusting the loop
latency: The rendering complexity is mostly affected by the
complexity of gaming scenes and the latter changes gradu-
ally. Thus, it is reasonable to predict Tpresent based on its
own historical information. In the implementation, the aver-
age time (denoted Tavg present) of the past 100 loops is used
as the prediction for the upcoming one (the similar method
has been adopted by [8], [9]) and the sleep time (Tsleep) is cal-
culated as:
Tsleep  ¼ 1=FIXED FPS À Tavg present
The true problem lies in how to judge whether a busy
server is suitable to undertake a new game instance or not.
Thus, we should solve Issue 2 anyway.
4.3 Hardware-Assistant Video Encoding
The fixed-FPS can mitigate the inference between games
because it allocates just enough resource for rendering. Fur-
ther, we use the hardware-assistant video-encoding capabil-
ity of commodity CPUs for less inference.
The hardware technology of Intel CPUs, Quick Sync, has
been employed. It owns a full-hardware function pipeline
to compress raw images in the RGB or YUV format into the
H264 video. Now Quick Sync has become one of the main-
stream hardware encoding technologies.6
On the test server,
a Quick-Sync-enabled CPU can simultaneously support up
to twenty 30-FPS encoding tasks (the image resolution is
1024 Â 768); the latency for one frame is as low as 4.9 ms.
Fig. 3. Gaming pipeline.
6. Quick Sync was introduced with the Sandy Bridge CPU micro-
architecture. It is a part of the integrated on the same die as the CPU.
Thus, to enable it work with a discrete graphics card (used for gaming),
some special configuration should be set up as described by http://
mirillis.com/en/products/tutorials/ action-tutorial-intel-quick-sync-
setup_for_desktops.html. For AMD, its Accelerated Processing Unit
(APU) has the similar function.
ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1243
www.redpel.com +917620593389
www.redpel.com +917620593389
Moreover, the CPU utilization of such one task is almost
negligible, less than 0.2 percent. (Details are presented in
Appendix A, which can be found on the Computer Society
Digital Library at http://doi.ieeecomputersociety.org/
10.1109/TPDS.2015.2433916). The result means it causes lit-
tle interference to other tasks. Thus, we use it as the refer-
ence implementation in all following tests, as well as in the
system prototype.
4.4 Resource-Metrics
Five types of system-resources have been focused on,
including the CPU, GPU, system-RAM, video-RAM and the
system bandwidth: The first two can be denoted by utiliza-
tion ratios; the next two are represented by memory con-
sumptions and the last refers to the miss number of the LLC
(Last Level Cache). Correspondingly, the server capacity
and the average resource requirements of a game (under
the condition satisfying the Responsiveness Conditions) can be
denoted by a tuple of five-items, U_CPU, U_GPU,
M_HOST, M_GPU, B.
Based on the above metrics, we should judge whether the
remaining resource-capacities of a server can meet the
demand of a new game or not. The key lies in how to mea-
sure the capacity of a server, as well as the game require-
ments. We present the following method to accomplish
these tasks, namely, to solve Issue 2.
4.4.1 Test Methods
Commercial GPUs usually implement driver / hardware
counters to provide the runtime performance information.
For example, the NVIDIA’s PerfKit APIs7
can collect
resource-consumption information of each GPU in real
time. Hence, we can get results accumulated from the previ-
ous time the GPU was sampled, including the percentage of
time the GPU is idle/busy, the consumption of graphic
memories, etc.
For commodity CPUs, the similar method has been used,
too. For instance, Intel has already provided the capability
to monitor performance events inside processors. Through
its performance counter monitor (PCM), a lot of perfor-
mance-related events per CPU-core, including the number
of LLC-misses, instructions per CPU cycle, etc., can be
obtained periodically.
The sample periods for CPU and GPU are both set to 3s.
In addition, we embed monitoring codes into the inter-
cepted gaming APIs to record processing delays of each
frame, which will be used to judge whether the Responsive-
ness Conditions have been met or not.
Moreover, it is necessary to note that, the integrated
graphics processor (that contains the Quick Sync encoding
engine) shares the LLC with CPU cores and there is no on-
chip graphics memory.8
Thus the hardware encoding pro-
cess needs to access the system memory (if the required
data is missed in the LLC), which means the corresponding
miss number is still suitable to indicate the memory
throughput with hardware encoding.
In addition, we select four representative games, includ-
ing three Direct3D video games (Direct3D is the most popu-
lar development library for PC video games) and one
OpenGL game. They are:
1) Need for Speed-Most Wanted (abbreviated to NFS).
It is a classic racing video game.
2) Modern Combat 2-Black Pegasus (abbreviated to
Combat), a first-person shooter video game.
3) Elder Scrolls: Skyrim-Dragonborn (abbreviated to
Scrolls), an action role-playing video game.
4) Angry Birds Classic: (abbreviated to Birds), the well-
known mobile-phone game’s PC version.
Several volunteers have been invited to play games on
the cloud gaming system and encouraged to play quite a
few game scenes and the duration is more than 15 minutes
for each game. After several loops, runtime information can
be collected for further analysis.
4.4.2 Test Cases
A Win 7 (64-bit) PC is used as the server, which is equipped
with an NVIDIA GTX780 GPU adapter (3 GB video mem-
ory), a common Core i7 CPU (four cores, 3.4 GHz) and 8 GB
RAM. By default, games will be streamed at the resolution
of 1024 Â 768 and the game picture quality is set to medium
in all cases; the FPS is fixed to 30. Video encoding is
completed by Quick Sync.
Single instance (Resource-requirement Tests). Each game has
been played in our virtualization environment alone and
resource consumptions are recorded in real-time. As
expected, Responsiveness Conditions can be met for each
game on the powerful machine; the corresponding
resource-requirements are presented as follows (Table 1).
Considering resource consolidation, the average value of
each item of the tuple has been used.
Multi-instances running simultaneously. Quite a few game
groups have been executed and sampled simultaneously.
For example, we have played 2$6 NFS instances at the
same time. Based on the runtime information, we can see
that this server can support up to five acceptable instances
simultaneously (we consider a game’s running quality
acceptable if its average FPS-value is not less than 90 per-
cent of the FIXED_FPS). While six instances are running, the
FPS value is less than 27, which is regarded as unacceptable.
Furthermore, we should identify the bottleneck that is
pivotal for task assignment. Considering the following facts
(in Fig. 4a), NFS is memory-io-critical:
When no more than five games are running simulta-
neously, the average FPS is stable (about 30.3) and the value
of million-miss-number-per-second increases almost linearly.
As six instances are running, the FPS is about 24.7 and the
throughput nearly remains unchanged (from 37.6 to 37.9). At
the same time, both U_GPU and U_CPU are far from
exhausted, 47 and 71 percent respectively. This phenomenon
indicates that memory accesses have impeded tasks from uti-
lizing the CPU/GPU resources efficiently. Moreover, memory
consumptions are not the bottleneck; thus no any swap opera-
tion will happen (For clarity, the information of memory-con-
sumptions is skipped in these figures).
For Combat and Scrolls (in Figs. 4b and 4c), the same
conclusion does hold: Under the condition satisfying
7. http://www.nvidia.com/object/nvperfkit_home.html
8. http://www.hardwaresecrets.com/printpage/Inside-the-Intel-
Sandy-Bridge-Microarchitecture/1161
1244 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
performance requirements, there can be at most three con-
current instances of Scrolls. For Combat, the maximum num-
ber of instances is 5. At the same time, both U_GPU and
U_CPU are limited, too. On the other hand, Birds (in Fig. 4d)
is CPU-critical because it can exhaust the CPU (97 percent as
10 instances running and the average FPS is 27.1), while
the value of million-miss-number-per-second increases
almost linearly.
4.4.3 Modeling
Based on the previous results, we have normalized
the resource requirement and the server capacity; the princi-
ple is critical-resource-first: (1) For a memory-io-critical game
that the game-server can occupy Ni instances, the fifth
item (Bandwidth) of its tuple is set as MAX_SYSTEM_
THROUGHPUT9
/ Ni, regardless of the absolute value. (2)
For any CPU-critical that the game-server can occupy Nj
instances, the value of its Ucpu is set as 1/ Nj. (3) The other
tuple items are kept unchanged.
For example, the tuple of NFS is 9.15 percent, 2.01 per-
cent, 526, 220, MAX_SYSTEM_THROUGHPUT / 5, and
the Birds’ tuple is 100 percent / 10, 1.1 percent, 181, 142,
6.54. Tuples of these four games are listed in Table 2.
Then for a set of M games (each denoted as Gamei,
0  ¼ i  M), if the sum of each kind of resource consump-
tion is less than the corresponding system-capacity, we con-
sider these games can run simultaneously and smoothly.
Formally, we use the following notations:
 U CPUgame i, U GPUgame i, M HOSTgame i, M GPUgame i;
B game i  : the tuple of resource requirements of Gamei;
100%, 100%, SERVER_RAM_CAPACITY, SERVER_VI-
DEO_RAM_CAPACITY, MAX_SYSTEM_THROUGHPUT
server: the capacity of a given server.
If the following conditions have been met, this sever can
occupy all games of the set running simultaneously.
X
0 i  M
U CPUgame i  100%
X
0 i  M
U GPUgame i  100%
X
0 i  M
M HOSTgame i  SERVER RAM CAPACITY
X
0 i  M
M GPUgame i  SERVER VIDEO RAM CAPACITY
X
0 i  M
Bgame i  MAX SYSTEM THROUGHPUT
Fig. 4. FPS and resource-consumptions of games.
TABLE 1
Resource-Requirements of Each Game
U_CPU
(%)
U_GPU
(%)
M_HOST
(MB)
M_GPU
(MB)
B (million miss-
number per
second)
NFS 9.15% 2.01% 526 220 8.10
Scrolls 14.55% 7.02% 795 560 13.52
Combat 8.47% 3.27% 800 296 7.97
Birds 9.36% 1.1% 181 142 6.54
9. MAX_SYSTEM_THROUGHPUT refers to the maximal LLC-miss-
number per second that the system can sustain. It can been evaluated
by a specially-designed program to access the memory space randomly
and intensively.
ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1245
www.redpel.com +917620593389
www.redpel.com +917620593389
For example, one Scrolls, one Combat and two NFS can
run at the same time; if an extra NFS joins, this condition
will not be met and the bottleneck is B. Quite a few tests of
real games will be given in Section 5.1 to verify this design.
4.5 The Scheduling Strategy
As a conclusion, the following procedure for task assign-
ment is illustrated, which contains two stages.
Ready-Stage: when a game is being on-line, it will be
tested to get its resource requirements. Then, for any game
(denoted as Game_i), a tuple U_CPU, U_GPU, M_HOST,
M_GPU, Bgame_i can be given to represent its
requirements.
In addition, for any Server_j, its capacity is denoted as
 U_CPU, U_GPU, M_HOST, M_GPU, Bserver_j. The corre-
sponding test-process has been described in the previous
paragraphs and each element will be labeled as the corre-
sponding maximum capacity.
Runtime-Stage: During the run time, the concurrent
resource-consumptions of each server (denoted as
U_CPU, U_GPU, M_HOST, M_GPU, Bserver_j_cur; in our
prototype, the average value of the latest one minute have
been used) are sampled periodically.
Moreover, the main goal of our scheduling strategy is to
minimize the number of servers used, which can be regarded
as a bin-packing problem. Serveral theoretical researches
[33], [34] have claimed that the First-fit and Best-fit algo-
rithms behave well for this problem, especially for the online
version with requests inserted in random order [34]. Thus,
we have designed two heuristic task-scheduling algorithms
based on the well-known First-fit and Best-fit strategies,
namely first-fit-like (FFL) and best-fit-like (BFL). The princi-
ple is straight; thus we only give their outlines here:
In FFL, for a given request of game_i, all servers will be
checked orderly, if one server (for example, server_j) can
occupy the new game, which means that each kind of
resource consumptions for all games on server_j (including
game_i) does not exceeds the capacity, this algorithm ends
successfully.
In BFL, the procedure is similar. The difference lies in
that, if there is more than one suitable server, the one will
leave the least critical resources is the best.
4.5.1 Tests with Artificial Traces
We have simulated our algorithms in two situations:
1) Several requests of the four games come simulta-
neously and must be dispatched instantly, namely,
in the batch processing mode.
2) Requests come one by one. The request-sequence fol-
lows a Poisson process with a mean time interval of
5 seconds; the duration of each game also follows a
Poisson process and the mean time is 40 minutes.
In both situations, we assume that there are enough servers
and each has an initial resource usage 10, 5, 3096, 512, 0
(it is gathered from our real servers). Thus, we can start a new
sever whenever needed. Moreover, from the aspect of
resource-usage, we mainly focus on the number of used-serv-
ers by each algorithm.
For the first situation, we have compared our algorithms
with three others:
Size-based task assignment (STA) [35]: This algorithm is
widely used in distributed systems, in which all tasks
within a given size range of resource requirements are
assigned to a particular server. Specific to our case, two
types of servers (for CPU-critical and for memory-IO-critical
respectively) are designated.
Packing algorithm (PA): It is a greedy algorithm. For
each server, it will be assigned as much games as possible
till all the games have been dispatched.
Dominant resource fairness (DRF) [36]: A fair sharing
model that generalizes max-min fairness to multiple
resource types. In our implementation, the collection of all
currently-used servers (called small servers) is regarded as a
big server. Whethe the big server can satisfy an incoming
request or not just depends on if there exists such a small
server. If not, a new small server will be added to enlarge
the big. The scheduling strategy inside the big one is First-fit
and all gaming requests are considered to be issued by dif-
ferent users.
We also estimate the ideal server-number for reference.
For each kind of resources (denoted by s), the minimum
number is
P
i¼1
P
i¼1
n
RRs
i =RRs
.
Here n is the total number of game requests; Ri denotes
the resource utilization of the i-th game and Rs
is the corre-
sponding resource capacity of a server. Thus, the maximum
of all minimums is the ideal number.
In the second situation our algorithms have been com-
pared with the STA algorithm, because others require the
information of the request sequence (which is unavailable
in this case) and will become the FFL.
Simulation results of Situation 1 are given in Fig. 5. The
y-axis stands for the needed-server numbers (for clarity,
TABLE 2
Resource-Requirements of Games
Tuple Game type
NFS 9.15%, 2.01%, 526, 220,
MAX_SYSTEM_BANDWIDTH / 5
memory-io-
critical
Scrolls 14.55%, 7.02%, 795, 560,
MAX_SYSTEM_BANDWIDTH / 3
memory-io-
critical
Combat 8.47%, 3.27%, 800, 296,
MAX_SYSTEM_BANDWIDTH / 5
memory-io-
critical
Birds 10%, 1.1%, 181, 142, 6.54 CPU-critical
Fig. 5. Server-numbers in Situation 1.
1246 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
values have been normalized) as several requests have
arrived simultaneously (the request number is illustrated
by the x-axis). We can figure out that, compared with others,
the heuristic algorithms are quite good. Even considering
the ideal number, our algorithms are really close to the opti-
mal (the maximal value is 101.23 percent). Moreover, these
two algorithms perform almost equal in all cases.
Fig. 6 shows that the number of requested servers
when requests arrived in sequence (Situation 2). We can
figure out that our heuristic algorithms are more efficient
than the STA. These two algorithms also perform simi-
larly in all cases: compared with the BFL algorithm, the
more consumed resources by the FFL are less than
3.6 percent (57:55). At last, results show the performance
of FFL is about 20 percent faster than the BFL, while
both are fast enough (in the batch processing mode, both
can complete the task-assignment in several milliseconds
as the request number is 1,000).
4.5.2 Tests with Real Game-Traces
To further evaluate the proposed task-scheduling strategies,
we conduct a trace-driven simulation for a large-scale clus-
ter (some similar simulation method has been used in [37]);
each server is the same as the one presented in Section 4.4.
The dataset we used is the World of Warcraft history dataset
provided by Yeng-Ting Chen et al. [38]. Although this data-
set is based on the MMORPG of “World of Warcraft”, we
think it is useful in our case because cloud gaming and
MMORPG share many similarities, such as wide variations
in the gaming time, a huge bandwidth-demand and a large
number of concurrent users. Of course, necessary pre-proc-
essing is introduced to make the dataset more suitable,
namely, we have mapped the first four races in the dataset
(Blood Elf, Orc, Tauren and Troll) to the four kinds of games
in our system and the remaining one (the undead) is mapped
to one of these four games randomly.
In detail, we have used traces of three months that con-
sist of 396,631 game-requests (details are shown in Table 3).
Accordingly, a cluster of 200 servers has been simulated, in
which the master node collects the resource utilization of all
servers every one minute. Because previous tests have
shown that BFL and FFL policies perform similarly, we
have only tested the BFL scheduling policy here.
Fig. 7 shows numbers of running game-instances, acti-
vated servers and used servers (once it is used, a server will
be regarded as a used server no matter whether it is being
activated or not); there is an obvious linear relationship
between the number of game-instances and the number of
activated servers. What’s more, the average number of acti-
vated servers is 64, which is significantly less than the maxi-
mum number of used servers (152). It means that the
scheduling efficiency is good; it also means server consoli-
dation [37] can be used to further reduce the number of
servers.
Fig. 8 shows the average resource-utilizations of acti-
vated servers of each day. Although the utilization rates of
other resources are relatively low, the bandwidth’s is high.
It proves that most games are memory-io-critical, which
accords with our performance model.
We have completed another simulation, in which the
server number is infinite, to illustrate the relationship
between the total of used servers and the update-interval
for resource utilization.
Fig. 9 shows the relationship; we can see that when the
update-interval is less than 20 minutes, the number of used
servers varies slightly. When the interval is larger, the num-
ber has increased significantly. It means that we could use a
longer update-interval and the impact on the system effi-
ciency is very limited. It is also helpful to manage a large-
scale cloud gaming system, because message exchanges
between server-agents and the manager will be reduced
apparently.
Fig. 6. Server-numbers in Situation 2.
TABLE 3
Details of the Dataset
Parameter Value
Simulated period 3 months
Server number 200
Total game requests 396,631
Maximum game requests
arriving simultaneously
227
Maximum game instances
running simultaneously
757
Average lifetime of game
instances
85 minutes
Average interval between game
requests
3 minutes
Fig. 7. Running games and servers of each day.
ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1247
www.redpel.com +917620593389
www.redpel.com +917620593389
4.6 Discussions
4.6.1 Different Game Configurations and/or
Heterogeneous Servers
The above work is targeted to specific hardware and games
and we believe the method is practical: it is reasonable to
assume that any game should be tested fully before on-line;
thus the resource requirements of each game can be mea-
sured on the given server of which the hardware configura-
tion will remain unchanged for a long period.
If heterogeneous servers are used, as we have found that
the host CPU or the memory bus is the system bottleneck,
new servers’ capacities can also be derived, based on the
comparison between the CPU performance and system
bandwidth of reference servers and new servers (these met-
rics may have been labeled by the producer or can be
tested), which can avoid the exponentially-growing com-
plexity of testing. Appendix B, available in the online supple-
mental material, gives an example to show that the
capability of a new server for known games is predictable
and then summarizes the prediction method.
For different game configurations, the situation is more
complicated. Even if only the resolution is different, tests
show that there is not an obvious relationship between the
resolution and resource consumptions, although the con-
sumption of our framework itself (like encoding and image-
capture) is proportional to the resolution.
Therefore, our solution is: during the real operation ser-
vice period, such configurations can be evaluated on line
first. For example, we can schedule the same game with
same configurations to some dedicated server(s) if a user
has demanded. With the accumulation of game-runs, the
metrics will become more accurate.
4.6.2 Time-Dependent Factors
We use average values to denote resource requirements of a
given game. In reality, requirements are time-dependent,
which may vary in different gaming stages. However, we
believe average values are enough owing to the following facts:
1) The variety degree depends on the time granularity
heavily. Our tests show that the degree becomes
smaller with the increase of the time interval. When
the time interval is 30s (in Appendix C, available in
the online supplemental material), the variety of
requirements is relatively small.
2) Consider resource consolidation of multiple concur-
rently-running games, the usage of average values
are reasonable.
Moreover, it is necessary to note that some games will
last very long time to finish. Thus in our experimental envi-
ronment, it is difficult to explore plenty of scenes. However,
such a game can be evaluated on line first for data accumu-
lation (as we have mentioned above).
5 IMPLEMENTATION AND EVALUATION
5.1 Implementation
We have implemented the cloud gaming system based on
the user-level virtualization technology. Eight PC servers
are connected by a Gigabit Ethernet; their configurations
are the same as the previous one in Section 4.4. Detours [39]
has been used to complete the required interception func-
tions. In detail, we have implemented a DLL (called gamedll)
that can be inserted into any gaming process to wrap all
interesting APIs and to spawn two threads for input-recep-
tion and data-encoding / -streaming respectively.
Now our virtualization layer can stream Direct3D games,
OpenGL games and flash games to Windows, iOS and
Android clients, and receive remote operations. The UDT
(UDP-based Data Transfer) protocol [40] is used to deliver
the video / audio / operation data between the server
and client.
We use the periodical video capture as the timing-refer-
ence on the server side; any audio data between two conse-
cutive video-capture-timestamps will be delivered with the
current video data.
To be specific, Windows Audio Session APIs provide
some interface to create and manage audio streams to and
from audio devices. Our interception does replicate such
stream buffers. After the current image has been captured,
the audio data between the current read / write positions
(read position is just the current playback position) of the
buffer will be copied out immediately and sent out with the
current image. This method completes video / audio
synchronization and limits the timing discrepancy to the
reciprocal of the FPS value or so.
As mentioned in Section 4.1, an exception lies in that
games may decrease the FPS deliberately in some scenes,
which will cause more timing discrepancies. To remedy this
Fig. 8. Resource-utilizations of activated servers.
Fig. 9. Used servers of different update-intervals.
1248 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
situation, a dedicated timer has been introduced to trigger
audio transmission only if the current interval of successive
frames is longer than a threshold.
Moreover, from the aspect of clients, to smooth the play-
back of received audio, one extra audio-buffer will be man-
aged by the cloud-gaming client software. Any received
audio will be stored into this buffer first to be appended to
the existing data (also in this buffer). As the whole buffer
has been filled, all will be copied to the playback device.
Thus, combined with the default buffer of the playback
device, it constructs a double-buffering mechanism, which
can parallelize the playback and reception and then smooth
the playback. Therefore, any audio data will be delayed for
some time: in our system, the length of this buffer is set to
occupy audio-data of 200 ms, which will make the playback
smooth. Results have been given in the next section.
5.2 Evaluation
The test environment and configurations are the same as
those in Section 4.4, as well as the testing method.
5.2.1 Overheads of the User-Level Virtualization
Technology Itself
We execute a game on a physical machine directly and
record the game speed (in term of the average FPS) and
average memory consumption. Then, this game is running
in the user-level virtualization environment (all related
APIs have been intercepted but no any real work, like image
capture, encoding, etc., has been enabled) and in a virtual
machine respectively; the same runtime information will be
recorded repeatedly.
The latest VMware Play 6 is employed and both the host
/ guest OSes are Win 7. The comparison is shown in Fig. 10
(for clarity, values have been normalized).
Consider the GPU utilization, the user-level technology
itself almost introduces no performance-loss, while the VM-
based solution’s efficiency is a little lower, about 90 percent
of the native. On the other side, the memory consumption
of the VM-based solution is 2.4 times as many as the native,
because the memory occupied by the guest OS is consider-
able. For the user-level solution, this consumption is almost
the same, too.
5.2.2 Processing Performance of the Server
The processing procedure of a cloud-gaming instance can
be divided into four parts: (1) image capture, which copies a
rendered into the system memory, (2) video encoding,
(3) transferring, which sends each compressed-frame into
the network, and (4) the process of the game-logic-operation
and rendering. The last one is mainly dependent on the con-
crete game while GCloud handles the others. Thus the first
three are object of this test and the amount of these delays is
denoted as SD (Server Delay).
Moreover, we intend to get the limit of the performance.
Hence only one instance is running on a server and the “try
the best” strategy is used. Namely, no Sleep call has been
inserted; the games can run as fast as possible. Some exist-
ing work [3] has completed the similar test for GamingAny-
where and Onlive, so that we can compare results with
theirs. Although the tested games of [3] are different, we
believe the comparison is meaningful because the server
delay is independent on specific games to a large extent.
Fig. 11 reports the average SD of three video games
under different resolutions. The corresponding FPS is in
Fig. 10. Comparison of resource consumption.
Fig. 11. Processing performance and the decomposition (three
resolutions).
ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1249
www.redpel.com +917620593389
www.redpel.com +917620593389
Fig. 12. The average value of 720P is given in Fig. 13, as well
as the corresponding values of GamingAnywhere and OnLive
(values have been normalized).
Results show that, compared with similar solutions,
GCloud achieves smaller SDs (ranging from 8 ms to 19 ms),
which are positive correlated with resolutions. We think it
is mainly attributed to the high encoding performance of
Quick Sync. In contrast, the encoding delay of GamingAny-
where is about 14$16 ms per frame.
The transferring latency is smaller than others by two
orders of magnitude. Even in following cases of multiple
games, it does still hold. Thus, the transferring latency can
be skipped, as we have proposed in Section 4.
5.2.3 Multiple Games
The “just good enough” strategy is used; a Sleep call has been
used to fix the FPS. First, an OpenGL game and three
Direct3D games have been played one by one and the proc-
essing delay (including the sleep time) is sampled periodi-
cally; the sample period is one frame. Second, quite a few
game combinations, each including more than one game,
have been executed and sampled. Without loss of general-
ity, FPS values of some game combinations that are played
simultaneously are presented in Table 4, as well as the aver-
age absolute deviations (AADs). These combinations are:
Case 1: Two NFS instances;
Case 2: One NFS, one Combat and one Scrolls;
Case 3: Two NFS, one Combat and one Scrolls;
Case 4: One NFS, one Combat, one Scrolls and two Birds.
On the whole, the average FPS ranges from 30.5 to 31.5 as
one game is running alone. Their average absolute devia-
tions are 0.10 (Birds), 0.11 (NFS), 0.15 (Combat) and 1.47
(Scrolls) respectively, which means the FPS value is fairly
stable. Of course, there are quite a few delay-fluctuations. It
usually means the corresponding game-scenes are changing
rapidly, which is the common case for highly-interactive
games, especially for Scrolls.
With the increment of the number of concurrently-run-
ning games (it means more interferences between games),
the FPS values decrease correspondingly while the average
absolute deviations increase:
For Scrolls, as three games running (Case 2) at the same
time, its average FPS is 28.3 and the AAD is 2.13. For four
instances (Case 3), the values are 27.8 and 2.98 respectively.
For Combat, as three games running simultaneously, the
average FPS is 29.2; the AAD is 0.89. For four, the values are
28.8 and 1.59 respectively.
For the uncertainties of FPS values, we believe the main
reason lies in two aspects:
1) There exists interferences among several running
instances, including resource contests, which make
resource-consumption not totally linear with the
increase of instances (as illustrated in Fig. 4). For
example, Scrolls consumes the most resources, thus
its uncertainty is the biggest.
2) As mentioned in Section 4.6, resource require-
ments of games are time-dependent, which may
vary in different stages. It has also caused some
uncertainties.
Anyway, it means that the system can get satisfactory
gaming-effect and the FPS can be made relatively stable, as
multiple games are running simultaneously.
5.2.4 Verification of the Performance Model
According to the result of the performance model and
scheduling strategy, we test several typical server loads for
verification. Without loss of generality, the following cases
have been presented.
1) One Scrolls, one Combat and two NFS. As presented in
Table 5 (1st
row), the FPS value of each game is more
than 27 and the lowest is Scrolls’s, about 27.1. All are
not less than 90 percent of the FIXED_FPS (30), thus
they are accepetable. Because the system-RAM band-
width has been nearly exhausted (about 93 percent of
the MAX_SYSTEM_BANDWIDTH), when another
game join (regardless NFS or Birds), the FPS of Scrolls
will drop below the acceptable level.
Fig. 12. FPS of games.
Fig. 13. Comparison of the processing delay (1280 Â 720; the lower the
better).
TABLE 4
FPS Values and Average Absolute Deviations of
Different Numbers of the Running Games
Game / Case 1 2 3 4
NFS FPS 30.2 30.3 30.2 30.2
AAD 0.18 0.24 0.44 0.70
Combat FPS N/A 29.2 28.8 28.6
AAD N/A 0.89 1.59 1.89
Scrolls FPS N/A 28.3 27.8 27.3
AAD N/A 2.13 2.98 3.30
Birds FPS N/A N/A N/A 29.8
AAD N/A N/A N/A 0.56
1250 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
2) One Scrolls, one Combat, one NFS and three Birds. For
this case, the sum of each kind of resource consump-
tion is less than the corresponding system-capacity;
the relative maximum is the sum of memory
throughputs, about 95 percent of the MAX_SYS-
TEM_THROUGHPUT.In Table 5 (second row), the
FPS value of each game is more than 27.
3) One NFS, two Combat and five Birds.
4) Three NFS and five Birds.
In Case 3  4, the sum of memory throughputs is about
96 percent of the MAX_SYSTEM_THROUGHPUT. Anyway,
as the sum of each kind of resource consumption is less than
the corresponding system-capacity, the FPS value of each
game is still more than 27.
5.2.5 Discrepancy between Video and Audio
We have designed a method to calculate this discrep-
ancy: on the server, some sequences of full-black images
are inserted into the video-stream to replace original
scenes; at the same time, mute data will replace the cor-
responding audio-data, too. On the client, a screen
recording software is running with the gaming client.
Thus, through the analysis of audio / video streams of
recorded data, we can get time-stamps of the beginnings
of inserted video / audio sequences respectively. Then,
the discrepancies can be calculated. Results show that
these values are in the range of 180 ms$410 ms (Table 6).
We think the reason lies in the following, besides the
preset delays aforementioned:
1) The delay-fluctuations of games. The corresponding
FPS-values will be less than 30, which will increase
the timing discrepancy, because the accumulation
process of audio-data will be slowed.
2) The network’s delay-fluctuations. They will increase
the timing discrepancy, too. Our tests are carried out
in the campus. We believe, for the Internet, this fac-
tor will cause more delays.
3) The measurement error. The recording software
records the screen periodically, 30 FPS, while the
audio recording is consecutive. Thus, beginnings of
some sequences of full-black images may be lost,
which will decrease the gap.
6 CONCLUSIONS AND FUTURE WORK
This paper proposes GCloud, a GPU/CPU hybrid cluster for
cloud gaming based on the user-level virtualization
technology. We focus on the guideline of task scheduling:
To balance the gaming-responsiveness and costs, we fix the
game’s FPS to allocate just enough resources, which can also
mitigate the inference between games. Accordingly, a per-
formance model has been analyzed to explore the server-
capacity and the game-demands on resource, which can
locate the performance bottleneck and guide the task-sched-
uling based on games’ critical resource-demands. Compari-
sons show that both the First-Fit-like and Best-Fit-like
scheduling strategies can outperform others. Moreover,
they are near optimal in the batch processing mode.
In the future, we plan to enhance performance models to
support heterogeneous servers.
ACKNOWLEDGMENTS
The work is supported by the High Tech. RD Program of
China under Grant No. 2013AA01A215.
REFERENCES
[1] R. Shea, L. Jiangchuan, E.C.-H. Ngai, and C. Yong, “Cloud gam-
ing: Architecture and performance,” IEEE Netw., vol. 27, no. 4,
pp. 16–21, Jul./Aug. 2013.
[2] Z. Zhao, K. Hwang, and J. Villeta, “GamePipe: A virtualized cloud
platform design and performance evaluation,” in Proc. ACM 3rd
Workshop Sci. Cloud Comput., 2012, pp. 1–8.
[3] C.-Y. Huang, C.-H. Hsu, Y.-C. Chang, and K.-T. Chen,
“GamingAnywhere: An open cloud gaming system,” in Proc.
ACM Multimedia Syst., Feb. 2013, pp. 36–47.
[4] R. Phull, C.-H. Li, K. Rao, S. Cadambi, and S. T. Chakradhar,
“Interference-driven resource management for GPU-based het-
erogeneous clusters,” in Proc. 21st ACM Int. Symp. High Perform.
Distrib. Comput., 2012, pp. 109–120.
[5] V. T. Ravi, M. Becchi, G. Agrawal, and S. T. Chakradhar,
“Supporting GPU sharing in cloud environments with a transpar-
ent runtime consolidation framework,” in Proc. 20th ACM Int.
Symp. High Perform. Distrib. Comput., 2011, pp. 217–228.
[6] G. A. Elliott and J. H. Anderson, “Globally scheduled real-time
multiprocessor systems with GPUs,” Real-Time Syst., vol. 48, no. 1.
pp. 34–74, 2012.
[7] L. Chen, O. Villa, S. Krishnamoorthy, and G. R. Gao, “Dynamic
load balancing on single- and multi-gpu systems,” in Proc. IEEE
Int. Symp. Parallel Distrib. Process., 2010, pp. 1–12.
[8] M. Yu, C. Zhang, Z. Qi, J. Yao, Y. Wang, and H. Guan, “GRIS:
Virtualized GPU resource isolation and scheduling in cloud
gaming,” in Proc. 22nd Int. Symp. High-Perform. Parallel Distrib.
Comput., 2012, pp. 203–214.
[9] C. Zhang, J. Yao, Z. Qi, M. Yu, and H. Guan, “vGASA: Adaptive
scheduling algorithm of virtualized GPU resource in cloud
gaming,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 11,
pp. 3036–3045, 2014.
[10] M. Claypool and K. Claypool, “Latency and player actions in
online games,” Commun. ACM, vol. 49, no. 11, pp. 40–45, 2006.
[11] D. C. Barboza, V. E. F. Rebello, E. W. G. Clua, and H. Lima, “A
simple architecture for digital games on demand using low per-
formance resources under a cloud computing paradigm,” in Proc.
Brazilian Symp., Games Digital Entertainment, 2010, pp. 33–39.
[12] D. De Winter, P. Simoens, and L. Deboosere, “A hybrid thin-client
protocol for multimedia streaming and interactive gaming
applications,” in Proc. Int. Workshop Netw. Oper. Syst. Support Digi-
tal Audio Video, 2006, p. 15.
TABLE 5
FPS of Concurrently-Running Games
TABLE 6
Discrepancy Values on the Client Side
Minimum Maximum Average
NFS 205 ms 395 ms 287 ms
Scrolls 213 ms 410 ms 323 ms
Combat 196 ms 336 ms 278 ms
Birds 180 ms 275 ms 242 ms
ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1251
www.redpel.com +917620593389
www.redpel.com +917620593389
[13] W. Yu, J. Li, C. Hu, and L. Zhong, “Muse: A multimedia streaming
enabled remote interactivity system for mobile devices,” in Proc.
10th Int. Conf. Mobile Ubiquitous Multimedia, 2011, pp. 216–225.
[14] L. Shi, H. Chen, and J. Sun, “vCUDA: GPU accelerated high per-
formance computing in virtual machines,” in Proc. IEEE Int. Symp.
Parallel Distrib. Process., 2009, pp. 1–11.
[15] J. Duato, A. J. Pena, F. Silla, R. Mayo, and E. S. Quintana- Ortı,
“rCUDA: Reducing the number of GPU-based accelerators in
high performance clusters,” in Proc. Int. Conf. High Perform. Com-
put. Simul., 2010, pp. 224–231.
[16] V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V.
Talwar, and P. Ranganathan, “GViM: Gpu-accelerated virtual
machines,” in Proc. ACM Workshop Syst.-Level Virtualization High
Perform. Comput., 2009, pp. 17–24.
[17] M. Bautin, A. Dwarakinath, and T. cker Chiueh, “Graphic engine
resource management,” in Proc. 15th Multimedia Comput. Netw.,
2008, pp. 15–21.
[18] D. Wu Z. Xue, and J. He “iCloudAccess: Cost-effective streaming
of video games from the cloud with low latency,” IEEE Trans.
Circuits Syst. Video Technol., vol. 24, no. 8, pp. 1405–1416, Jan. 2014.
[19] H.-J. Hong, D.-Y. Chen, C.-Y. Huang, K.-T. Chen, and C.-H. Hsu,
“Placing virtual machines to optimize cloud gaming experience,”
IEEE Trans. Cloud Comput. , vol. 3, no. 1, pp. 42–53, Jan.–Mar. 2015.
[20] S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa,
“TimeGraph: GPU scheduling for real-time multi-tasking environ-
ments,” in Proc. USENIX Conf. USENIX Annu. Tech. Conf., 2011, p. 2.
[21] L Cherkasova and L Staley, “Building a performance model of
streaming media application in utility data center environment,” in
Proc. 3rd IEEE/ACM Int. Symp. Cluster Comput. Grid, 2003, pp. 52–59.
[22] V. Ishakian and A. Bestavros, “MORPHOSYS: Efficient colocation
of QoS-constrainedworkloads in the cloud,” in Proc. 12th IEEE/
ACM Int. Symp. Cluster, Cloud Grid Comput., 2012, pp. 90–97.
[23] S. Wang and S. Dey, “Rendering adaptation to address communi-
cation and computation constraints in cloud mobile gaming,” in
Proc. Global Telecommun. Conf., Dec. 6–10, 2010, pp. 1–6.
[24] D. Klionsky. A new architecture for cloud rendering and amor-
tized graphics. M.S. Thesis, School Comput. Sci., Carnegie Mellon
Univ., CMU-CS-11–122. [Online]. Available: http://reports-
archive.adm.cs.cmu.edu/anon/2011/abstracts/11–122.html.
[25] A. Jurgelionis, P. Fechteler, P. Eisert, F. Bellotti, and H. David,
“Platform for distributed 3D gaming,” Int. J. Comput. Games Tech-
nol. , vol. 2009, p. 1, 2009.
[26] A. Ojala and P. Tyrvainen, “Developing cloud business models:
A case study on cloud gaming,” IEEE Softw., vol. 28, no. 4,
pp. 42–47, Jul. 2011.
[27] S.-W. Chen, Y.-C. Chang, and P.-H. Tseng, C.-Y. Huang, and C.-L.
Lei, “Measuring the latency of cloud gaming systems,” in Proc.
19th ACM Int. Conf. Multimedia, 2011, pp. 1269–1272.
[28] S. Choy, B. Wong, G. Simon, and C. Rosenberg “The brewing
storm in cloud gaming: A measurement study on cloud to end-
user latency,” in Proc. 11th Annu. Workshop Netw. Syst. Support
Games, 2012, p. 2.
[29] Y.-T. Lee, K.-T. Chen, H.-I. Su, and C.-L. Lei, “Are all games equally
cloud-gaming-friendly? An electromyographic approach,” in Proc.
IEEE/ACM NetGames, 2012, pp. 109–120.
[30] K.-T. Chen, Y.-C. Chang, H.-J. Hsu, D.-Y. Chen, C.-Y. Huang, and
C.-H. Hsu, “On the quality of service of cloud gaming systems,”
IEEE Trans. Multimedia, vol. 16, no. 2, pp. 480–495, Feb. 2014.
[31] Y. Zhang, X. Wang, and L. Hong, “Portable desktop applications
based on P2P transportation and virtualization,” in Proc. 22nd
Large Installation Syst. Administration Conf., 2008, pp. 133–144.
[32] P. Guo, “CDE: Run any linux application on-demand without
installation,” in Proc. 25th USENIX Large Installation Syst. Adminis-
tration Conf., 2011, p. 2.
[33] B. Xia and T. Zhiyi, “Tighter bounds of the first fit algorithm for
the bin-packing problem,” Discrete Appl. Math., vol. 158, no. 15,
pp. 1668–1675, 2010.
[34] C. Kenyon, “Best-fit bin-packing with random order,” in Proc. 7th
Annu. ACM-SIAM Symp. Discrete Algorithm, 1996, vol. 96,
pp. 359–364.
[35] M. Harchol-Balter, M. E. Crovella, and C. Duarte Murta, “On
Choosing a task assignment policy for a distributed server sys-
tem,” J. Parallel Distrib. Comput., vol. 59, no. 2, pp. 204–228, 1999.
[36] A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker,
and I. Stoica, “Dominant resource fairness: Fair allocation of mul-
tiple resource types,” in Proc. 8th USENIX Symp. Netw. Syst. Des.
Implementation, 2011, pp. 323–336.
[37] Y.-T. Lee and K.-T. Chen, “Is server consolidation beneficial to
MMORPG? A case study of world of warcraft,” in Proc. IEEE 3rd
Int. Conf. Cloud Comput., 2013, pp. 435–442.
[38] Y.-T. Lee, K.-T. Chen, Y.-M. Cheng, and C.-L. Lei, “World of war-
craft avatar history dataset,” in Proc. 2nd Anuu. ACM Multimedia
Syst., Feb. 2011, pp. 123–128.
[39] G. Hunt and D. Brubacher, “Detours: Binary interception of
Win32 functions,” in Proc. 3rd USENIX Windows NT Symp., Jul.
1999, p. 14.
[40] Y. Gu and R. L. Grossman, “UDT: UDP-based data transfer for
high-speed wide area networks,” Comput. Netw., vol. 51, no. 7,
pp. 109–120, May 2007.
Youhui Zhang received the BSc and PhD
degrees in computer science from Tsinghua Uni-
versity, China, in 1998 and 2002. He is currently
a professor in the Department of Computer Sci-
ence, Tsinghua University. His research interests
include computer architecture, cloud computing,
and high-performance computing. He is a mem-
ber of the IEEE and the IEEE Computer Society.
Peng Qu received the BSc degree in computer
science from Tsinghua University, China, in
2013. He is currently working toward the PhD
degree in the Department of Computer Science,
University of Tsinghua, China. His interests
include cloud computing and micro-architecture.
Cihang Jiang received the BSc degree in com-
puter science from Tsinghua University, China, in
2013. He is currently a master student in the
Department of Computer Science, University of
Tsinghua, China. His research interest is cloud
computing.
Weimin Zheng received the BSc and MSc
degrees in computer science from Tsinghua Uni-
versity, China, in 1970 and 1982, respectively.
He is currently a professor in the Department of
Computer Science, University of Tsinghua,
China. His research interests include high perfor-
mance computing, network storage and distrib-
uted computing. He is a member of the IEEE and
the IEEE Computer Society.
 For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
1252 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389

Contenu connexe

Tendances

Gamebryo LightSpeed(English)
Gamebryo LightSpeed(English)Gamebryo LightSpeed(English)
Gamebryo LightSpeed(English)Gamebryo
 
Resumption of virtual machines after adaptive deduplication of virtual machin...
Resumption of virtual machines after adaptive deduplication of virtual machin...Resumption of virtual machines after adaptive deduplication of virtual machin...
Resumption of virtual machines after adaptive deduplication of virtual machin...IJECEIAES
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computingbakers84
 
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerFast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerIndrajit Poddar
 
SIMULATION AND PERFORMANCE ANALYSIS OF A LARGE SCALED INTERNET APPLICATION ...
SIMULATION AND PERFORMANCE ANALYSIS OF  A LARGE SCALED INTERNET APPLICATION  ...SIMULATION AND PERFORMANCE ANALYSIS OF  A LARGE SCALED INTERNET APPLICATION  ...
SIMULATION AND PERFORMANCE ANALYSIS OF A LARGE SCALED INTERNET APPLICATION ...ankit_saluja
 
Dpm4 Data Center
Dpm4 Data CenterDpm4 Data Center
Dpm4 Data Centervncson
 
Shader model 5 0 and compute shader
Shader model 5 0 and compute shaderShader model 5 0 and compute shader
Shader model 5 0 and compute shaderzaywalker
 
Power through your high school courseload with a responsive Chromebook
Power through your high school courseload with a responsive ChromebookPower through your high school courseload with a responsive Chromebook
Power through your high school courseload with a responsive ChromebookPrincipled Technologies
 
LIQUID-A Scalable Deduplication File System For Virtual Machine Images
LIQUID-A Scalable Deduplication File System For Virtual Machine ImagesLIQUID-A Scalable Deduplication File System For Virtual Machine Images
LIQUID-A Scalable Deduplication File System For Virtual Machine Imagesfabna benz
 
Accelerate performance on machine learning workloads with the Dell EMC PowerE...
Accelerate performance on machine learning workloads with the Dell EMC PowerE...Accelerate performance on machine learning workloads with the Dell EMC PowerE...
Accelerate performance on machine learning workloads with the Dell EMC PowerE...Principled Technologies
 
Accelerate performance on machine learning workloads with the Dell EMC PowerE...
Accelerate performance on machine learning workloads with the Dell EMC PowerE...Accelerate performance on machine learning workloads with the Dell EMC PowerE...
Accelerate performance on machine learning workloads with the Dell EMC PowerE...Principled Technologies
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_PlaceKohei KaiGai
 
Classification of Virtualization Environment for Cloud Computing
Classification of Virtualization Environment for Cloud ComputingClassification of Virtualization Environment for Cloud Computing
Classification of Virtualization Environment for Cloud ComputingSouvik Pal
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Indrajit Poddar
 
Vulkan Ray Tracing Update Japan Virtual Open House Feb 2021
Vulkan Ray Tracing Update Japan Virtual Open House Feb 2021Vulkan Ray Tracing Update Japan Virtual Open House Feb 2021
Vulkan Ray Tracing Update Japan Virtual Open House Feb 2021The Khronos Group Inc.
 
Cloud Gaming Architectures: From Social to Mobile to MMO
Cloud Gaming Architectures: From Social to Mobile to MMOCloud Gaming Architectures: From Social to Mobile to MMO
Cloud Gaming Architectures: From Social to Mobile to MMOAWS Germany
 
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...Zhenyun Zhuang
 
50,000 Seat V Mware View Deployment
50,000 Seat V Mware View Deployment50,000 Seat V Mware View Deployment
50,000 Seat V Mware View DeploymentMichael Hudak
 

Tendances (20)

Gamebryo LightSpeed(English)
Gamebryo LightSpeed(English)Gamebryo LightSpeed(English)
Gamebryo LightSpeed(English)
 
Resumption of virtual machines after adaptive deduplication of virtual machin...
Resumption of virtual machines after adaptive deduplication of virtual machin...Resumption of virtual machines after adaptive deduplication of virtual machin...
Resumption of virtual machines after adaptive deduplication of virtual machin...
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
openCL Paper
openCL PaperopenCL Paper
openCL Paper
 
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerFast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
 
SIMULATION AND PERFORMANCE ANALYSIS OF A LARGE SCALED INTERNET APPLICATION ...
SIMULATION AND PERFORMANCE ANALYSIS OF  A LARGE SCALED INTERNET APPLICATION  ...SIMULATION AND PERFORMANCE ANALYSIS OF  A LARGE SCALED INTERNET APPLICATION  ...
SIMULATION AND PERFORMANCE ANALYSIS OF A LARGE SCALED INTERNET APPLICATION ...
 
Dpm4 Data Center
Dpm4 Data CenterDpm4 Data Center
Dpm4 Data Center
 
Shader model 5 0 and compute shader
Shader model 5 0 and compute shaderShader model 5 0 and compute shader
Shader model 5 0 and compute shader
 
Power through your high school courseload with a responsive Chromebook
Power through your high school courseload with a responsive ChromebookPower through your high school courseload with a responsive Chromebook
Power through your high school courseload with a responsive Chromebook
 
LIQUID-A Scalable Deduplication File System For Virtual Machine Images
LIQUID-A Scalable Deduplication File System For Virtual Machine ImagesLIQUID-A Scalable Deduplication File System For Virtual Machine Images
LIQUID-A Scalable Deduplication File System For Virtual Machine Images
 
Accelerate performance on machine learning workloads with the Dell EMC PowerE...
Accelerate performance on machine learning workloads with the Dell EMC PowerE...Accelerate performance on machine learning workloads with the Dell EMC PowerE...
Accelerate performance on machine learning workloads with the Dell EMC PowerE...
 
Accelerate performance on machine learning workloads with the Dell EMC PowerE...
Accelerate performance on machine learning workloads with the Dell EMC PowerE...Accelerate performance on machine learning workloads with the Dell EMC PowerE...
Accelerate performance on machine learning workloads with the Dell EMC PowerE...
 
Using VMTurbo to boost performance
Using VMTurbo to boost performanceUsing VMTurbo to boost performance
Using VMTurbo to boost performance
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
 
Classification of Virtualization Environment for Cloud Computing
Classification of Virtualization Environment for Cloud ComputingClassification of Virtualization Environment for Cloud Computing
Classification of Virtualization Environment for Cloud Computing
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
 
Vulkan Ray Tracing Update Japan Virtual Open House Feb 2021
Vulkan Ray Tracing Update Japan Virtual Open House Feb 2021Vulkan Ray Tracing Update Japan Virtual Open House Feb 2021
Vulkan Ray Tracing Update Japan Virtual Open House Feb 2021
 
Cloud Gaming Architectures: From Social to Mobile to MMO
Cloud Gaming Architectures: From Social to Mobile to MMOCloud Gaming Architectures: From Social to Mobile to MMO
Cloud Gaming Architectures: From Social to Mobile to MMO
 
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
 
50,000 Seat V Mware View Deployment
50,000 Seat V Mware View Deployment50,000 Seat V Mware View Deployment
50,000 Seat V Mware View Deployment
 

En vedette

QoS in Network Gaming
QoS in Network GamingQoS in Network Gaming
QoS in Network Gamingruiquelhas
 
NVIDIA Cloud Gaming
NVIDIA Cloud GamingNVIDIA Cloud Gaming
NVIDIA Cloud GamingPhil Eisler
 
Adaptive Cloud Security: Game-Changing Cloud Security and Compliance Automati...
Adaptive Cloud Security: Game-Changing Cloud Security and Compliance Automati...Adaptive Cloud Security: Game-Changing Cloud Security and Compliance Automati...
Adaptive Cloud Security: Game-Changing Cloud Security and Compliance Automati...Amazon Web Services
 
グラフィック仮想化セミナー - エヌビディアジャパン様
グラフィック仮想化セミナー - エヌビディアジャパン様グラフィック仮想化セミナー - エヌビディアジャパン様
グラフィック仮想化セミナー - エヌビディアジャパン様Dell TechCenter Japan
 
Japan GPU-Accelerated VDI Community 2016/11/21
Japan GPU-Accelerated VDI Community 2016/11/21Japan GPU-Accelerated VDI Community 2016/11/21
Japan GPU-Accelerated VDI Community 2016/11/21Hideaki Tagami
 
Nvidia grid and vGPU
Nvidia grid and vGPUNvidia grid and vGPU
Nvidia grid and vGPUKyle Quinby
 
Building Multiplayer Games (w/ Unity)
Building Multiplayer Games (w/ Unity)Building Multiplayer Games (w/ Unity)
Building Multiplayer Games (w/ Unity)Noam Gat
 

En vedette (7)

QoS in Network Gaming
QoS in Network GamingQoS in Network Gaming
QoS in Network Gaming
 
NVIDIA Cloud Gaming
NVIDIA Cloud GamingNVIDIA Cloud Gaming
NVIDIA Cloud Gaming
 
Adaptive Cloud Security: Game-Changing Cloud Security and Compliance Automati...
Adaptive Cloud Security: Game-Changing Cloud Security and Compliance Automati...Adaptive Cloud Security: Game-Changing Cloud Security and Compliance Automati...
Adaptive Cloud Security: Game-Changing Cloud Security and Compliance Automati...
 
グラフィック仮想化セミナー - エヌビディアジャパン様
グラフィック仮想化セミナー - エヌビディアジャパン様グラフィック仮想化セミナー - エヌビディアジャパン様
グラフィック仮想化セミナー - エヌビディアジャパン様
 
Japan GPU-Accelerated VDI Community 2016/11/21
Japan GPU-Accelerated VDI Community 2016/11/21Japan GPU-Accelerated VDI Community 2016/11/21
Japan GPU-Accelerated VDI Community 2016/11/21
 
Nvidia grid and vGPU
Nvidia grid and vGPUNvidia grid and vGPU
Nvidia grid and vGPU
 
Building Multiplayer Games (w/ Unity)
Building Multiplayer Games (w/ Unity)Building Multiplayer Games (w/ Unity)
Building Multiplayer Games (w/ Unity)
 

Similaire à A cloud gaming system based on user level virtualization and its resource scheduling

Cloud Gaming Onward: Research Opportunities and Outlook
Cloud Gaming Onward: Research Opportunities and OutlookCloud Gaming Onward: Research Opportunities and Outlook
Cloud Gaming Onward: Research Opportunities and OutlookAcademia Sinica
 
Gamelets - Multiplayer Mobile Games with Distributed Micro-Clouds [Full Text]
Gamelets - Multiplayer Mobile Games with Distributed Micro-Clouds [Full Text]Gamelets - Multiplayer Mobile Games with Distributed Micro-Clouds [Full Text]
Gamelets - Multiplayer Mobile Games with Distributed Micro-Clouds [Full Text]Anand Bhojan
 
Network traffic adaptation for cloud games
Network traffic adaptation for cloud gamesNetwork traffic adaptation for cloud games
Network traffic adaptation for cloud gamesijccsa
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
 
Running Dicom Visualization On The Cell (Ps3) Rsna Poster Presentation
Running Dicom Visualization On The Cell (Ps3) Rsna Poster PresentationRunning Dicom Visualization On The Cell (Ps3) Rsna Poster Presentation
Running Dicom Visualization On The Cell (Ps3) Rsna Poster Presentationbroekemaa
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUsCMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUsAmazon Web Services
 
Gpu computing-webgl
Gpu computing-webglGpu computing-webgl
Gpu computing-webglVisCircle
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Editor IJARCET
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Editor IJARCET
 
VisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalVisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalMasatsugu HASHIMOTO
 
Supporting bioinformatics applications with hybrid multi-cloud services
Supporting bioinformatics applications with hybrid multi-cloud servicesSupporting bioinformatics applications with hybrid multi-cloud services
Supporting bioinformatics applications with hybrid multi-cloud servicesAhmed Abdullah
 
Towards Fog-Assisted Virtual Reality MMOG with Ultra-Low Latency
Towards Fog-Assisted Virtual Reality MMOG with Ultra-Low LatencyTowards Fog-Assisted Virtual Reality MMOG with Ultra-Low Latency
Towards Fog-Assisted Virtual Reality MMOG with Ultra-Low LatencyIJCNCJournal
 
Cloud mobile 3 d display gaming user experience modeling and optimization by ...
Cloud mobile 3 d display gaming user experience modeling and optimization by ...Cloud mobile 3 d display gaming user experience modeling and optimization by ...
Cloud mobile 3 d display gaming user experience modeling and optimization by ...ieeepondy
 
Image Processing Application on Graphics processors
Image Processing Application on Graphics processorsImage Processing Application on Graphics processors
Image Processing Application on Graphics processorsCSCJournals
 
An exposition of performance comparison of graphic processing unit virtualiza...
An exposition of performance comparison of graphic processing unit virtualiza...An exposition of performance comparison of graphic processing unit virtualiza...
An exposition of performance comparison of graphic processing unit virtualiza...Asif Farooq
 
Cloud Gaming: Seminar report
Cloud Gaming: Seminar reportCloud Gaming: Seminar report
Cloud Gaming: Seminar reportGautam krishna.R
 
An Exposition of Performance Comparison of Graphic Processing Unit Virtualiza...
An Exposition of Performance Comparison of Graphic Processing Unit Virtualiza...An Exposition of Performance Comparison of Graphic Processing Unit Virtualiza...
An Exposition of Performance Comparison of Graphic Processing Unit Virtualiza...IJCSIS Research Publications
 
VDI performance and price comparison: AMD-based Open Compute 3.0 server vs. H...
VDI performance and price comparison: AMD-based Open Compute 3.0 server vs. H...VDI performance and price comparison: AMD-based Open Compute 3.0 server vs. H...
VDI performance and price comparison: AMD-based Open Compute 3.0 server vs. H...Principled Technologies
 

Similaire à A cloud gaming system based on user level virtualization and its resource scheduling (20)

Cloud Gaming Onward: Research Opportunities and Outlook
Cloud Gaming Onward: Research Opportunities and OutlookCloud Gaming Onward: Research Opportunities and Outlook
Cloud Gaming Onward: Research Opportunities and Outlook
 
Gamelets - Multiplayer Mobile Games with Distributed Micro-Clouds [Full Text]
Gamelets - Multiplayer Mobile Games with Distributed Micro-Clouds [Full Text]Gamelets - Multiplayer Mobile Games with Distributed Micro-Clouds [Full Text]
Gamelets - Multiplayer Mobile Games with Distributed Micro-Clouds [Full Text]
 
Network traffic adaptation for cloud games
Network traffic adaptation for cloud gamesNetwork traffic adaptation for cloud games
Network traffic adaptation for cloud games
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
 
Running Dicom Visualization On The Cell (Ps3) Rsna Poster Presentation
Running Dicom Visualization On The Cell (Ps3) Rsna Poster PresentationRunning Dicom Visualization On The Cell (Ps3) Rsna Poster Presentation
Running Dicom Visualization On The Cell (Ps3) Rsna Poster Presentation
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUsCMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
 
Gpu computing-webgl
Gpu computing-webglGpu computing-webgl
Gpu computing-webgl
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
 
VisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalVisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_Final
 
Supporting bioinformatics applications with hybrid multi-cloud services
Supporting bioinformatics applications with hybrid multi-cloud servicesSupporting bioinformatics applications with hybrid multi-cloud services
Supporting bioinformatics applications with hybrid multi-cloud services
 
Towards Fog-Assisted Virtual Reality MMOG with Ultra-Low Latency
Towards Fog-Assisted Virtual Reality MMOG with Ultra-Low LatencyTowards Fog-Assisted Virtual Reality MMOG with Ultra-Low Latency
Towards Fog-Assisted Virtual Reality MMOG with Ultra-Low Latency
 
Cloud mobile 3 d display gaming user experience modeling and optimization by ...
Cloud mobile 3 d display gaming user experience modeling and optimization by ...Cloud mobile 3 d display gaming user experience modeling and optimization by ...
Cloud mobile 3 d display gaming user experience modeling and optimization by ...
 
Image Processing Application on Graphics processors
Image Processing Application on Graphics processorsImage Processing Application on Graphics processors
Image Processing Application on Graphics processors
 
An exposition of performance comparison of graphic processing unit virtualiza...
An exposition of performance comparison of graphic processing unit virtualiza...An exposition of performance comparison of graphic processing unit virtualiza...
An exposition of performance comparison of graphic processing unit virtualiza...
 
Cloud Gaming: Seminar report
Cloud Gaming: Seminar reportCloud Gaming: Seminar report
Cloud Gaming: Seminar report
 
An Exposition of Performance Comparison of Graphic Processing Unit Virtualiza...
An Exposition of Performance Comparison of Graphic Processing Unit Virtualiza...An Exposition of Performance Comparison of Graphic Processing Unit Virtualiza...
An Exposition of Performance Comparison of Graphic Processing Unit Virtualiza...
 
VDI performance and price comparison: AMD-based Open Compute 3.0 server vs. H...
VDI performance and price comparison: AMD-based Open Compute 3.0 server vs. H...VDI performance and price comparison: AMD-based Open Compute 3.0 server vs. H...
VDI performance and price comparison: AMD-based Open Compute 3.0 server vs. H...
 

Plus de redpel dot com

An efficient tree based self-organizing protocol for internet of things
An efficient tree based self-organizing protocol for internet of thingsAn efficient tree based self-organizing protocol for internet of things
An efficient tree based self-organizing protocol for internet of thingsredpel dot com
 
Validation of pervasive cloud task migration with colored petri net
Validation of pervasive cloud task migration with colored petri netValidation of pervasive cloud task migration with colored petri net
Validation of pervasive cloud task migration with colored petri netredpel dot com
 
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...redpel dot com
 
Towards a virtual domain based authentication on mapreduce
Towards a virtual domain based authentication on mapreduceTowards a virtual domain based authentication on mapreduce
Towards a virtual domain based authentication on mapreduceredpel dot com
 
Toward a real time framework in cloudlet-based architecture
Toward a real time framework in cloudlet-based architectureToward a real time framework in cloudlet-based architecture
Toward a real time framework in cloudlet-based architectureredpel dot com
 
Protection of big data privacy
Protection of big data privacyProtection of big data privacy
Protection of big data privacyredpel dot com
 
Privacy preserving and delegated access control for cloud applications
Privacy preserving and delegated access control for cloud applicationsPrivacy preserving and delegated access control for cloud applications
Privacy preserving and delegated access control for cloud applicationsredpel dot com
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...redpel dot com
 
Frequency and similarity aware partitioning for cloud storage based on space ...
Frequency and similarity aware partitioning for cloud storage based on space ...Frequency and similarity aware partitioning for cloud storage based on space ...
Frequency and similarity aware partitioning for cloud storage based on space ...redpel dot com
 
Multiagent multiobjective interaction game system for service provisoning veh...
Multiagent multiobjective interaction game system for service provisoning veh...Multiagent multiobjective interaction game system for service provisoning veh...
Multiagent multiobjective interaction game system for service provisoning veh...redpel dot com
 
Efficient multicast delivery for data redundancy minimization over wireless d...
Efficient multicast delivery for data redundancy minimization over wireless d...Efficient multicast delivery for data redundancy minimization over wireless d...
Efficient multicast delivery for data redundancy minimization over wireless d...redpel dot com
 
Cloud assisted io t-based scada systems security- a review of the state of th...
Cloud assisted io t-based scada systems security- a review of the state of th...Cloud assisted io t-based scada systems security- a review of the state of th...
Cloud assisted io t-based scada systems security- a review of the state of th...redpel dot com
 
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageI-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageredpel dot com
 
Bayes based arp attack detection algorithm for cloud centers
Bayes based arp attack detection algorithm for cloud centersBayes based arp attack detection algorithm for cloud centers
Bayes based arp attack detection algorithm for cloud centersredpel dot com
 
Architecture harmonization between cloud radio access network and fog network
Architecture harmonization between cloud radio access network and fog networkArchitecture harmonization between cloud radio access network and fog network
Architecture harmonization between cloud radio access network and fog networkredpel dot com
 
Analysis of classical encryption techniques in cloud computing
Analysis of classical encryption techniques in cloud computingAnalysis of classical encryption techniques in cloud computing
Analysis of classical encryption techniques in cloud computingredpel dot com
 
An anomalous behavior detection model in cloud computing
An anomalous behavior detection model in cloud computingAn anomalous behavior detection model in cloud computing
An anomalous behavior detection model in cloud computingredpel dot com
 
A tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big dataA tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big dataredpel dot com
 
A parallel patient treatment time prediction algorithm and its applications i...
A parallel patient treatment time prediction algorithm and its applications i...A parallel patient treatment time prediction algorithm and its applications i...
A parallel patient treatment time prediction algorithm and its applications i...redpel dot com
 
A mobile offloading game against smart attacks
A mobile offloading game against smart attacksA mobile offloading game against smart attacks
A mobile offloading game against smart attacksredpel dot com
 

Plus de redpel dot com (20)

An efficient tree based self-organizing protocol for internet of things
An efficient tree based self-organizing protocol for internet of thingsAn efficient tree based self-organizing protocol for internet of things
An efficient tree based self-organizing protocol for internet of things
 
Validation of pervasive cloud task migration with colored petri net
Validation of pervasive cloud task migration with colored petri netValidation of pervasive cloud task migration with colored petri net
Validation of pervasive cloud task migration with colored petri net
 
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
 
Towards a virtual domain based authentication on mapreduce
Towards a virtual domain based authentication on mapreduceTowards a virtual domain based authentication on mapreduce
Towards a virtual domain based authentication on mapreduce
 
Toward a real time framework in cloudlet-based architecture
Toward a real time framework in cloudlet-based architectureToward a real time framework in cloudlet-based architecture
Toward a real time framework in cloudlet-based architecture
 
Protection of big data privacy
Protection of big data privacyProtection of big data privacy
Protection of big data privacy
 
Privacy preserving and delegated access control for cloud applications
Privacy preserving and delegated access control for cloud applicationsPrivacy preserving and delegated access control for cloud applications
Privacy preserving and delegated access control for cloud applications
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
 
Frequency and similarity aware partitioning for cloud storage based on space ...
Frequency and similarity aware partitioning for cloud storage based on space ...Frequency and similarity aware partitioning for cloud storage based on space ...
Frequency and similarity aware partitioning for cloud storage based on space ...
 
Multiagent multiobjective interaction game system for service provisoning veh...
Multiagent multiobjective interaction game system for service provisoning veh...Multiagent multiobjective interaction game system for service provisoning veh...
Multiagent multiobjective interaction game system for service provisoning veh...
 
Efficient multicast delivery for data redundancy minimization over wireless d...
Efficient multicast delivery for data redundancy minimization over wireless d...Efficient multicast delivery for data redundancy minimization over wireless d...
Efficient multicast delivery for data redundancy minimization over wireless d...
 
Cloud assisted io t-based scada systems security- a review of the state of th...
Cloud assisted io t-based scada systems security- a review of the state of th...Cloud assisted io t-based scada systems security- a review of the state of th...
Cloud assisted io t-based scada systems security- a review of the state of th...
 
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageI-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
 
Bayes based arp attack detection algorithm for cloud centers
Bayes based arp attack detection algorithm for cloud centersBayes based arp attack detection algorithm for cloud centers
Bayes based arp attack detection algorithm for cloud centers
 
Architecture harmonization between cloud radio access network and fog network
Architecture harmonization between cloud radio access network and fog networkArchitecture harmonization between cloud radio access network and fog network
Architecture harmonization between cloud radio access network and fog network
 
Analysis of classical encryption techniques in cloud computing
Analysis of classical encryption techniques in cloud computingAnalysis of classical encryption techniques in cloud computing
Analysis of classical encryption techniques in cloud computing
 
An anomalous behavior detection model in cloud computing
An anomalous behavior detection model in cloud computingAn anomalous behavior detection model in cloud computing
An anomalous behavior detection model in cloud computing
 
A tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big dataA tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big data
 
A parallel patient treatment time prediction algorithm and its applications i...
A parallel patient treatment time prediction algorithm and its applications i...A parallel patient treatment time prediction algorithm and its applications i...
A parallel patient treatment time prediction algorithm and its applications i...
 
A mobile offloading game against smart attacks
A mobile offloading game against smart attacksA mobile offloading game against smart attacks
A mobile offloading game against smart attacks
 

Dernier

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Dernier (20)

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

A cloud gaming system based on user level virtualization and its resource scheduling

  • 1. A Cloud Gaming System Based on User-Level Virtualization and Its Resource Scheduling Youhui Zhang, Member, IEEE, Peng Qu, Jiang Cihang, and Weimin Zheng, Member, IEEE Abstract—Many believe the future of gaming lies in the cloud, namely Cloud Gaming, which renders an interactive gaming application in the cloud and streams the scenes as a video sequence to the player over Internet. This paper proposes GCloud, a GPU/CPU hybrid cluster for cloud gaming based on the user-level virtualization technology. Specially, we present a performance model to analyze the server-capacity and games’ resource-consumptions, which categorizes games into two types: CPU-critical and memory-io-critical. Consequently, several scheduling strategies have been proposed to improve the resource-utilization and compared with others. Simulation tests show that both of the First-Fit-like and the Best-Fit-like strategies outperform the other(s); especially they are near optimal in the batch processing mode. Other test results indicate that GCloud is efficient: An off-the-shelf PC can support five high-end video-games run at the same time. In addition, the average per-frame processing delay is 8$19 ms under different image-resolutions, which outperforms other similar solutions. Index Terms—Cloud computing, cloud gaming, resource scheduling, user-level virtualization Ç 1 INTRODUCTION CLOUD gaming provides game-on-demand services over the Internet. This model has several advantages [1]: it allows easy access to games without owning a game console or high-end graphics processing units (GPUs); the game dis- tribution and maintenance become much easier. For cloud gaming, the response latency is the most essen- tial factor of the quality of gamers’ experience “on the cloud”. The number of games that can run on one machine simultaneously is another important issue, which makes this mode economical and then really practical. Thus, to optimize cloud gaming experiences, CPU / GPU hybrid systems are usually employed because CPU-only solutions are not efficient for graphics rendering. One of the industrial pioneers of cloud gaming, Onlive1 emphasized the former: it allocated one GPU per instance for high-end video games. To improve utilization, some other service-providers use the virtual machine (VM) technology to share the GPU among games running on top of VMs. For example, GaiKai2 and G-cluster3 stream games from cloud- servers located around the world to internet-connected devi- ces. Since the end of 2013, Amazon EC2 has also provided the service for streaming games based on VMs.4 More technical details can be acquired from non- commercial projects. GamePipe [2] is a VM-based cloud cluster of CPU/GPU servers. Its characteristic lies in that, not only cloud resources but also the local resources of clients can be employed to improve the gaming quality. Another system, GamingAnywhere [3], has used the user- level virtualization technology. Compared with some solu- tions, its processing delay is lower. Besides, task scheduling is regarded as another key issue to improve the utilization of resources, which has been veri- fied in the high-performance GPU-computing fields [4], [5], [6], [7]. However, to the best of our knowledge, the schedul- ing research for cloud gaming has not received much attention yet. One example based on VMs is VGRIS [8] (including its successor VGASA [9]. It is a GPU-resource management framework in the host OS and schedules vir- tualized resource of guest OSes. This paper proposes the design of a GPU/CPU hybrid sys- tem for cloud gaming and its prototype, GCloud. GCloud has used the user-level virtualization technology to implement a sandbox for different types of games, which can isolate more than one game-instance from each other on a game-server, transparently capture the game’s video/audio outputs for streaming, and handle the remote client-device’s inputs. Moreover, a performance model has been presented; thus we have analyzed resource-consumptions of games and performance bottleneck(s) of a server, through exces- sive experiments using a variety of hardware performance- counters. Accordingly, several task-scheduling strategies have been designed to improve the server utilization and been evaluated respectively. Different from related researches, we focus on the guide- line of task-assignment, that is, on the reception of a game- launch request, we should judge if a server is suitable to undertake the new instance or not, under the condition sat- isfying the performance requirements. In addition, from the aspect of user-level virtualization (there is some existing user-level solution, like Gaming- Anywhere [3]), GCloud has its own characteristics: 1. http://www.onlive.com/ 2. https://www.gaikai.com/ 3. http://www.g-cluster.com/eng/ 4. https://aws.amazon.com/game-hosting/ The authors are with the Department of Computer Science and Technology, Tsinghua University, Beijing, China. E-mail: {zyh02, zwm-dcs}@tsinghua. edu.cn, shen_yhx@163.com, famousjch@qq.com. Manuscript received 13 Nov. 2014; revised 11 May 2015; accepted 11 May 2015. Date of publication 14 May 2015; date of current version 13 Apr. 2016. Recommended for acceptance by Y. Wang. For information on obtaining reprints of this article, please send e-mail to: reprints@ieee.org, and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TPDS.2015.2433916 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016 1239 1045-9219 ß 2015 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. www.redpel.com +917620593389 www.redpel.com +917620593389
  • 2. First, it implements a virtual input-layer for each of con- currently-running instances, rather than a system-wide one, which can support more than one Direct-3D games at the same time. Second, it designs a virtual storage layer to trans- parently store each client’s configurations across all servers, which has not been mentioned by related projects. In summary, the following contributions have been accomplished: 1) Enabling-technologies based on the light-weight virtu- alization are introduced, especially those of GCloud ‘s characteristics. (Section 3) 2) To balance the gaming-responsiveness and costs, we adopt a “just good enough” principle to fix the FPS (frame per second) of games to an acceptable level. Under this principle, a performance model is con- structed to analyze resource consumptions of games, which categorizes games into two types: CPU-critical and memory-io-critical; thus several scheduling mech- anisms have been presented to improve the utiliza- tion and compared. In addition, different from previous jobs focused on the GPU-resource, our work has found the host CPU or the memory bus is the system bottleneck when several games are run- ning simultaneously. (Section 4) 3) Such a cloud-gaming cluster has been constructed, which supports the mainstream game-types. Results of tests show that GCloud is highly efficient: An off- the-shelf PC can support up to five concurrently-run- ning video-games (each game’s image-resolution is 1024 Â 768 and the frame per second is 30). The aver- age per-frame processing delay is 8$19 ms under different image-resolutions, which can satisfy the stringent delay requirement of highly-interactive games. Tests have also verified the effects of our per- formance model. (Section 5) The remainder of this paper is organized as follows. Section 2 presents the background knowledge of cloud gam- ing as well as related work. Sections 3 and 4 are the main part: the former introduces the user-level virtualization framework and enabling technologies; the performance model and its analysis method are given in the latter, as well as the scheduling strategies. Section 5 presents the prototype cluster and evaluates its performance. Section 6 concludes. 2 RELATED WORK 2.1 Cloud Gaming Cloud gaming is a type of online gaming that allows direct and on-demand streaming of game-scenes to networked-devices, in which the actual game is running on the server-end (main steps have been described in Fig. 1). Moreover, to ensure the interactivity, all of these serial oper- ations must happen in the order of milliseconds, which challenges the system design critically. The amount of latencies is defined as interaction delay. Existing researches [10] have shown that different types of games put forward different requirements. One solution type of cloud-gaming is VM-based. For the solutions based on VMs, Step 1 is completed in the guest OS while others on the server-end are accomplish by the host. Barboza et al. [11] presents such a solution, which provides cloud gaming services and uses three levels of managers for the cloud, hosts and clients. Some existing work, like GaiKai, G-cluster, Amazon EC2 for streaming games and GamePipe [2], also belong to this category. In contrast to VM-based solutions, the user-level solution inserts the virtualization layer between applications and the run-time environment. This mode simplifies the processing stack; thus it can reduce the extra overhead. GamingAny- where [3] is such a user-level implementation, which sup- ports Direct3D/SDL games on Windows and SDL games on Linux. Some solutions have enhanced the thin-client protocol to support interactive gaming applications. Dependent on the concrete implementation, they can be classified into the two types. For example, Winter et al. [12] have enhanced the thin-client server driver to integrate a real-time desktop streamer to stream the graphical output of applications after GPU processing, which can be regarded as a light-weight virtualization-based solution. In contrast, Muse [13] uses VMs to isolate and share GPU resources on the cloud-end, which has enhanced the remote frame buffer (RFB) protocol to compress the frame-buffer contents of server-side VMs. However, these researches have focused on the optimiza- tion of interaction delay, namely, taken care of the perfor- mance of a single game on the cloud, rather than the interference between concurrently-running instances. More- over, none of these systems has presented any specific scheduling strategy. 2.2 Resource Scheduling For high performance computing (HPC), GPU virtualization has been widely researched [14], [15], [16] for general pur- pose computing. From the scheduling viewpoint, there are also several researches, including Phull et al. [4], Ravi et al. [5], Elliott and Anderson [6], [7] L. Chen et al. [7] and Bautin et al. [17]. Fig. 1. The whole workflow of cloud-gaming. 1240 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 3. However, none of these researches has considered the cloud gaming characteristics, including the critical demand on processing latencies, highly-coupled sequential opera- tions and so on. The work on the scheduling for cloud gaming is limited: VGRIS [8] and its successor VGASA [9] are resource man- agement frameworks for VM-based GPU resources, which have implemented several scheduling algorithms for differ- ent objectives. However, they are focused on scheduling rendering tasks on a GPU, without considering other tasks like image-capture / -encoding, etc. iCloudAccess [18] has proposed an online control algorithm to perform gaming- request dispatching and VM-server provisioning to reduce latencies of the cloud gaming platform. A recent work is [19], which has studied the optimized placement of cloud- gaming-enabled VMs. The proposed heuristic algorithms are efficient and nearly optimal. Ours can be regarded as complementary to these researches, because they are focused on the VM-granularity dispatching / provisioning while we pay attention to issues inside an OS. One related work on GPU-scheduling (but not cloud-gam- ing-specific) is TimeGraph [20]: it is a real-time GPU scheduler that has modified the device-driver for protecting important GPU workloads from performance interference. Similarly, it has not considered the cloud gaming characteristics. Another category of related researches [21], [22] is con- cerning the streaming media applications. For example, Cherkasova and Staley [21] developed a workload-aware per- formance model for video-on-demand (VOD) applications, which is helpful to measure the capacity of a streaming sever as well as the resource requirements. We have referred their design principles to construct our performance model. 2.3 Others To improve the processing efficiency and adaptation, Wang and Dey [23] propose a rendering adaptation technique to adapt the game rendering parameters to satisfy Cloud Mobile Gaming’s constraints. Klionsky [24] has presented an architecture which amortizes the cost of across-user ren- dering. However, these two technologies are not transpar- ent to games. In addition, Jurgelionis et al. [25] explored the impact of networking on gaming; Ojala and Tyrvainen [26] developed a business model behind a cloud gaming company. As a summary, compared with the above-mentioned work, GCloud has the following features: 1) It is based on the user-level virtualization. Compared with some existing user-level solution, GCloud has proposed more thorough solutions for the virtual input / storage. 2) From the aspect of performance modeling and sched- uling, more real jobs (including image-capture, encod- ing, etc.) have been considered (compared with VGRIS / VGASA [8], [9]). In addition, we use the hard- ware-assistant video encoding to mitigate the infer- ence between games and to improve the performance. 3) Last but not least, our work is focused on related issues inside a node, while [18], [19] do work on the VM-granularity. 4) Furthermore, quite a few researches have been car- ried out to measure the performance of cloud gam- ing systems, like [27], [28], [29] and [30]. We also referred them to complete our measurements. 3 SYSTEM ARCHITECTURE AND ENABLING TECHNOLOGIES 3.1 The Framework The system (in Fig. 2) is built with a cluster of CPU / GPU- hybrid computing servers; a dedicated storage server is used as the shared storage. Each computing server can host the execution of several games simultaneously. One of these servers is employed as the manager-node, which collects real-time running information of all servers and completes management tasks, including the task-assignment, user authentication, etc. It is necessary to note that the framework in Fig. 2 is for small / medium system-scales. For a large scale system with many users, a hierarchical architecture is needed to avoid the bottleneck of information-exchange. In fact, because the quality of gamers’ experience highly depends on the response latency and the latter is sensitive to the physical distance between clients and servers, the architec- ture may be geographically-distributed, which is out of scape of this paper. It also means that in one site the scale will not be very large.5 Initially, gaming-agents on available computing servers register to the manager, indicating that they are ready and Fig. 2. System architecture. 5. According to OnLive, the theoretical upper bound of the distance between a user and a cloud gaming server is approximately 1,000 miles. In China, some gaming systems provide services for just one city or sev- eral cities. ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1241 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 4. which games they can execute. When a client wants to play some game, the manager will search for candidates among the registered information. After such a server has been cho- sen, a start-up command will be sent to the corresponding agent to boot up the game within a light-weight virtualiza- tion environment. Then, its address will be sent to the client. Future communication will be done directly between the two ends. During the run time, each agent collects local runtime information and sends it to the manager periodically; the latter can get the latest status of resource-consumptions. The storage server is an important role to provide the personalized game-configuration for each user. For instance, User A had played Game B on Server C. Now A wants to play the game again while the manager finds that Server C’s resources have been depleted. Then the task has to be assigned to another server, D. Consequently, it is nec- essary to restore A’s configurations of B on D, including the game’s progress and other customized information. The storage server is just used as the shared storage for all com- puting nodes. 3.2 The User-Level Virtualization Environment For each game, API Interception is employed to implement a lightweight virtualization environment. API interception means to intercept calls from the application to the underly- ing running system. The typical applications include soft- ware streaming [31], [32], etc. Here it is used to catch the corresponding resource-access APIs from the game. In addi- tion, our main target platform is MS Windows as Windows dominates the PC video-game world. 3.2.1 Image Capture Usually, gaming applications employ the mainstream 3D computer-graphics-rendering libraries, like Direct3D or OpenGL, to complete the hardware (GPU) acceleration; GCloud supports both of them. In the case of Direct3D, the typical workflow of a game is usually an endless loop: First, some CPU computation pre- pares the data for the GPU, e.g., calculating objects in the upcoming frame. Then, the data is uploaded to the GPU buffer and the GPU performs the computation, e.g., render- ing, using its buffer contents and fills the front buffer. To fetch contents of the image into the system memory for the consequent processing, we intercept the Direct3D’s Present API. For OpenGL, we have intercepted the Present-like API in OpenGL, glutSwapBuffers, to capture images. For other games based on the common GUI window, we just set a timer for the application’s main window, then we intercept the right message handler to capture the image of the target window periodically. 3.2.2 Audio Capture Capturing of audio data is a platform-dependent task. Because our main target platform is MS Windows, we inter- cept Windows Audio Session APIs to capture the sound. Core Audio serves as the foundation of quite a few higher- level APIs; thus this method can bring about the best adaptability. 3.2.3 Virtual Input Layer Flash-based or OpenGL-based applications are usually using the window’s default message-loop to handle inputs. Thus, the solution is straightforward: We inject a dedicated input-thread into the intercepted game-process. On recep- tion of any control command from the client, this thread will convert it into a local input message and send it to the target window. For Direct3D-based games, the situation is more compli- cated. The existing work [3] replays input events using the SendInput API on Windows. However, SendInput inserts events into a system-wide queue, rather than the queue of a specific process. So, it is difficult to support more than one instance for the non-VM solution. To conquer this problem, we intercepted quite a few DirectInput APIs to simulate input-queues for any virtualized application; thus the user’s input can be pushed into these queues and made accessible to applications. 3.2.4 Virtual Storage Layer From the storage aspect, a program can be divided into three parts [31]: Part 12 include all resources provided by the OS and those created/modified by the installation pro- cess; Part 3 is the data created/modified/deleted during the run time, which contains game-configurations of each user. For the immutable parts, it is relatively easy to distribute them to servers through some system clone method. The focus is how to migrate resources of Part 3 across servers to provide personalized game-configurations for users. We construct a virtual storage layer by the interception of file-system and registry accessing APIs of all games. During the run time, the resource modified by the game instance will be moved into Part 3. When the previously-described case in Section 3.1 occurs, the virtual storage layer of Game B on the current server can redirect resource-accesses to the shared storage to visit the latest configurations of User A, which were stored by the last run on Server C. 4 PERFORMANCE MODEL AND TASK SCHEDULING As mentioned in Section 1, the response latency and the number of games that one machine can execute simulta- neously are both essential to a cloud gaming system. To a large extent, they are in contradiction and existing systems (like [3], [11], [12]) usually focus on the first issue. However, it is not always economical. For example, if the FPS of a given game is too high, it will consume more resources. Moreover, the loss compression will counteract the high video-quality to a certain extent. Some scheduling work, like VGRIS / VGASA [8], [9], has presented multi-task scheduling strategies. There are several essential differences between our work and VGRIS / VGASA: First, they are focused on how to schedule existing games on a server, including the allocation of enough GPU resources for a game, etc. In contrast, GCloud is focused on the assignment of a new task. Second, they are focused on the GPU resource and no any other operation (like image-capture, encoding, etc.) has been considered, while our tests (presented in Section 4.4) show the host CPU or the memory bus is the bottleneck. Third, VGRIS and VGASA are VM-specific. 1242 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 5. In this paper, we adopt a “just good enough” strategy, which means that we just keep the game quality at some acceptable level and then we try to satisfy the interactivity requests of as many games as possible. Hence, there are two main issues: Issue 1: For a given server and its running game instances, how to make sure the game quality is acceptable? Issue 2: On an incoming request, which server is suitable to launch the new game instance? For Issue 1, we first give a brief pipeline model for cloud- gaming, which can be used to judge whether the game qual- ity is acceptable or not. Second, a method to fix the FPS has been presented to provide the “just good enough” quality; some hardware-assistant video encoding technique has also been used to mitigate the inference between games further. For Issue 2, several resource-metrics have been given. Then we carry out tests to measure the server capacity and to cat- egorize games into different types. Accordingly, we design a server capacity model and corresponding task-assignment strategies. These strategies have been compared with others. 4.1 Game Quality A cloud gaming system’s interaction delay contains three parts [27]: (1) Network delay, the time required for a round of data exchange between the server and client; (2) Play-out delay, the time required for the client to handle the received for playback; (3) Processing delay, required for the server to process a player’s command, and to encode and send the corresponding frame back. This paper is mainly about the server-side and the net- work is assumed to be able to provide the sufficient band- width, thus we focus on the processing delay that should be confined into a limited range. The work [25] on measuring the latency of cloud gaming has disclaimed that, for some existing service-providers (like Onlive), the processing delay is about 100-200ms. Thus, we use 100 ms as our scheduling target, denoted MAX_PD. Another measurement of key metrics is FPS; the required FPS is illustrated as FIXED_FPS. In this work, FIXED_FPS is set to 30 by default. As presented by Fig. 1, the gaming workflow can be regarded as a pipeline including four steps: operations of gaming logic, graphic rendering (including the image cap- ture), encoding (including the color-space conversion) and transmission. In addition, our tests show that given the suf- ficient bandwidth, the delay of transmission is much less than other steps. Thus, the fourth step can be skipped and we focus on the remaining three. Furthermore, the first two steps are completed by the intercepted process, which is transparent to us; thus we should combine them together and the sum of these laten- cies is denoted by Tpresent. The average processing time of the encoding step is denoted by Tencoding (The pipeline is presented in Fig. 3). Hence, if the following conditions (referred as Responsiveness Conditions) have been satisfied, the requirement on the FPS and processing delay will be met undoubtedly. To be more precise, satisfaction of the first two conditions means the establishment of the last one, under the default case. Tpresent ¼ 1=FIXED FPS and (1) Tencoding ¼ 1=FIXED FPS and (2) Tencoding þ Tpresent ¼ MAX PD (3) 4.2 Fixed FPS To provide the “just good enough” gaming quality, the FPS value should be fixed to some acceptable level (Issue 1). Because the interface of GPU drivers is not open, our solu- tion is in the user-space, too. Take the Direct3D game as an example, we intercept the Present API to insert a Sleep call for adjusting the loop latency: The rendering complexity is mostly affected by the complexity of gaming scenes and the latter changes gradu- ally. Thus, it is reasonable to predict Tpresent based on its own historical information. In the implementation, the aver- age time (denoted Tavg present) of the past 100 loops is used as the prediction for the upcoming one (the similar method has been adopted by [8], [9]) and the sleep time (Tsleep) is cal- culated as: Tsleep ¼ 1=FIXED FPS À Tavg present The true problem lies in how to judge whether a busy server is suitable to undertake a new game instance or not. Thus, we should solve Issue 2 anyway. 4.3 Hardware-Assistant Video Encoding The fixed-FPS can mitigate the inference between games because it allocates just enough resource for rendering. Fur- ther, we use the hardware-assistant video-encoding capabil- ity of commodity CPUs for less inference. The hardware technology of Intel CPUs, Quick Sync, has been employed. It owns a full-hardware function pipeline to compress raw images in the RGB or YUV format into the H264 video. Now Quick Sync has become one of the main- stream hardware encoding technologies.6 On the test server, a Quick-Sync-enabled CPU can simultaneously support up to twenty 30-FPS encoding tasks (the image resolution is 1024 Â 768); the latency for one frame is as low as 4.9 ms. Fig. 3. Gaming pipeline. 6. Quick Sync was introduced with the Sandy Bridge CPU micro- architecture. It is a part of the integrated on the same die as the CPU. Thus, to enable it work with a discrete graphics card (used for gaming), some special configuration should be set up as described by http:// mirillis.com/en/products/tutorials/ action-tutorial-intel-quick-sync- setup_for_desktops.html. For AMD, its Accelerated Processing Unit (APU) has the similar function. ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1243 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 6. Moreover, the CPU utilization of such one task is almost negligible, less than 0.2 percent. (Details are presented in Appendix A, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/ 10.1109/TPDS.2015.2433916). The result means it causes lit- tle interference to other tasks. Thus, we use it as the refer- ence implementation in all following tests, as well as in the system prototype. 4.4 Resource-Metrics Five types of system-resources have been focused on, including the CPU, GPU, system-RAM, video-RAM and the system bandwidth: The first two can be denoted by utiliza- tion ratios; the next two are represented by memory con- sumptions and the last refers to the miss number of the LLC (Last Level Cache). Correspondingly, the server capacity and the average resource requirements of a game (under the condition satisfying the Responsiveness Conditions) can be denoted by a tuple of five-items, U_CPU, U_GPU, M_HOST, M_GPU, B. Based on the above metrics, we should judge whether the remaining resource-capacities of a server can meet the demand of a new game or not. The key lies in how to mea- sure the capacity of a server, as well as the game require- ments. We present the following method to accomplish these tasks, namely, to solve Issue 2. 4.4.1 Test Methods Commercial GPUs usually implement driver / hardware counters to provide the runtime performance information. For example, the NVIDIA’s PerfKit APIs7 can collect resource-consumption information of each GPU in real time. Hence, we can get results accumulated from the previ- ous time the GPU was sampled, including the percentage of time the GPU is idle/busy, the consumption of graphic memories, etc. For commodity CPUs, the similar method has been used, too. For instance, Intel has already provided the capability to monitor performance events inside processors. Through its performance counter monitor (PCM), a lot of perfor- mance-related events per CPU-core, including the number of LLC-misses, instructions per CPU cycle, etc., can be obtained periodically. The sample periods for CPU and GPU are both set to 3s. In addition, we embed monitoring codes into the inter- cepted gaming APIs to record processing delays of each frame, which will be used to judge whether the Responsive- ness Conditions have been met or not. Moreover, it is necessary to note that, the integrated graphics processor (that contains the Quick Sync encoding engine) shares the LLC with CPU cores and there is no on- chip graphics memory.8 Thus the hardware encoding pro- cess needs to access the system memory (if the required data is missed in the LLC), which means the corresponding miss number is still suitable to indicate the memory throughput with hardware encoding. In addition, we select four representative games, includ- ing three Direct3D video games (Direct3D is the most popu- lar development library for PC video games) and one OpenGL game. They are: 1) Need for Speed-Most Wanted (abbreviated to NFS). It is a classic racing video game. 2) Modern Combat 2-Black Pegasus (abbreviated to Combat), a first-person shooter video game. 3) Elder Scrolls: Skyrim-Dragonborn (abbreviated to Scrolls), an action role-playing video game. 4) Angry Birds Classic: (abbreviated to Birds), the well- known mobile-phone game’s PC version. Several volunteers have been invited to play games on the cloud gaming system and encouraged to play quite a few game scenes and the duration is more than 15 minutes for each game. After several loops, runtime information can be collected for further analysis. 4.4.2 Test Cases A Win 7 (64-bit) PC is used as the server, which is equipped with an NVIDIA GTX780 GPU adapter (3 GB video mem- ory), a common Core i7 CPU (four cores, 3.4 GHz) and 8 GB RAM. By default, games will be streamed at the resolution of 1024 Â 768 and the game picture quality is set to medium in all cases; the FPS is fixed to 30. Video encoding is completed by Quick Sync. Single instance (Resource-requirement Tests). Each game has been played in our virtualization environment alone and resource consumptions are recorded in real-time. As expected, Responsiveness Conditions can be met for each game on the powerful machine; the corresponding resource-requirements are presented as follows (Table 1). Considering resource consolidation, the average value of each item of the tuple has been used. Multi-instances running simultaneously. Quite a few game groups have been executed and sampled simultaneously. For example, we have played 2$6 NFS instances at the same time. Based on the runtime information, we can see that this server can support up to five acceptable instances simultaneously (we consider a game’s running quality acceptable if its average FPS-value is not less than 90 per- cent of the FIXED_FPS). While six instances are running, the FPS value is less than 27, which is regarded as unacceptable. Furthermore, we should identify the bottleneck that is pivotal for task assignment. Considering the following facts (in Fig. 4a), NFS is memory-io-critical: When no more than five games are running simulta- neously, the average FPS is stable (about 30.3) and the value of million-miss-number-per-second increases almost linearly. As six instances are running, the FPS is about 24.7 and the throughput nearly remains unchanged (from 37.6 to 37.9). At the same time, both U_GPU and U_CPU are far from exhausted, 47 and 71 percent respectively. This phenomenon indicates that memory accesses have impeded tasks from uti- lizing the CPU/GPU resources efficiently. Moreover, memory consumptions are not the bottleneck; thus no any swap opera- tion will happen (For clarity, the information of memory-con- sumptions is skipped in these figures). For Combat and Scrolls (in Figs. 4b and 4c), the same conclusion does hold: Under the condition satisfying 7. http://www.nvidia.com/object/nvperfkit_home.html 8. http://www.hardwaresecrets.com/printpage/Inside-the-Intel- Sandy-Bridge-Microarchitecture/1161 1244 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 7. performance requirements, there can be at most three con- current instances of Scrolls. For Combat, the maximum num- ber of instances is 5. At the same time, both U_GPU and U_CPU are limited, too. On the other hand, Birds (in Fig. 4d) is CPU-critical because it can exhaust the CPU (97 percent as 10 instances running and the average FPS is 27.1), while the value of million-miss-number-per-second increases almost linearly. 4.4.3 Modeling Based on the previous results, we have normalized the resource requirement and the server capacity; the princi- ple is critical-resource-first: (1) For a memory-io-critical game that the game-server can occupy Ni instances, the fifth item (Bandwidth) of its tuple is set as MAX_SYSTEM_ THROUGHPUT9 / Ni, regardless of the absolute value. (2) For any CPU-critical that the game-server can occupy Nj instances, the value of its Ucpu is set as 1/ Nj. (3) The other tuple items are kept unchanged. For example, the tuple of NFS is 9.15 percent, 2.01 per- cent, 526, 220, MAX_SYSTEM_THROUGHPUT / 5, and the Birds’ tuple is 100 percent / 10, 1.1 percent, 181, 142, 6.54. Tuples of these four games are listed in Table 2. Then for a set of M games (each denoted as Gamei, 0 ¼ i M), if the sum of each kind of resource consump- tion is less than the corresponding system-capacity, we con- sider these games can run simultaneously and smoothly. Formally, we use the following notations: U CPUgame i, U GPUgame i, M HOSTgame i, M GPUgame i; B game i : the tuple of resource requirements of Gamei; 100%, 100%, SERVER_RAM_CAPACITY, SERVER_VI- DEO_RAM_CAPACITY, MAX_SYSTEM_THROUGHPUT server: the capacity of a given server. If the following conditions have been met, this sever can occupy all games of the set running simultaneously. X 0 i M U CPUgame i 100% X 0 i M U GPUgame i 100% X 0 i M M HOSTgame i SERVER RAM CAPACITY X 0 i M M GPUgame i SERVER VIDEO RAM CAPACITY X 0 i M Bgame i MAX SYSTEM THROUGHPUT Fig. 4. FPS and resource-consumptions of games. TABLE 1 Resource-Requirements of Each Game U_CPU (%) U_GPU (%) M_HOST (MB) M_GPU (MB) B (million miss- number per second) NFS 9.15% 2.01% 526 220 8.10 Scrolls 14.55% 7.02% 795 560 13.52 Combat 8.47% 3.27% 800 296 7.97 Birds 9.36% 1.1% 181 142 6.54 9. MAX_SYSTEM_THROUGHPUT refers to the maximal LLC-miss- number per second that the system can sustain. It can been evaluated by a specially-designed program to access the memory space randomly and intensively. ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1245 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 8. For example, one Scrolls, one Combat and two NFS can run at the same time; if an extra NFS joins, this condition will not be met and the bottleneck is B. Quite a few tests of real games will be given in Section 5.1 to verify this design. 4.5 The Scheduling Strategy As a conclusion, the following procedure for task assign- ment is illustrated, which contains two stages. Ready-Stage: when a game is being on-line, it will be tested to get its resource requirements. Then, for any game (denoted as Game_i), a tuple U_CPU, U_GPU, M_HOST, M_GPU, Bgame_i can be given to represent its requirements. In addition, for any Server_j, its capacity is denoted as U_CPU, U_GPU, M_HOST, M_GPU, Bserver_j. The corre- sponding test-process has been described in the previous paragraphs and each element will be labeled as the corre- sponding maximum capacity. Runtime-Stage: During the run time, the concurrent resource-consumptions of each server (denoted as U_CPU, U_GPU, M_HOST, M_GPU, Bserver_j_cur; in our prototype, the average value of the latest one minute have been used) are sampled periodically. Moreover, the main goal of our scheduling strategy is to minimize the number of servers used, which can be regarded as a bin-packing problem. Serveral theoretical researches [33], [34] have claimed that the First-fit and Best-fit algo- rithms behave well for this problem, especially for the online version with requests inserted in random order [34]. Thus, we have designed two heuristic task-scheduling algorithms based on the well-known First-fit and Best-fit strategies, namely first-fit-like (FFL) and best-fit-like (BFL). The princi- ple is straight; thus we only give their outlines here: In FFL, for a given request of game_i, all servers will be checked orderly, if one server (for example, server_j) can occupy the new game, which means that each kind of resource consumptions for all games on server_j (including game_i) does not exceeds the capacity, this algorithm ends successfully. In BFL, the procedure is similar. The difference lies in that, if there is more than one suitable server, the one will leave the least critical resources is the best. 4.5.1 Tests with Artificial Traces We have simulated our algorithms in two situations: 1) Several requests of the four games come simulta- neously and must be dispatched instantly, namely, in the batch processing mode. 2) Requests come one by one. The request-sequence fol- lows a Poisson process with a mean time interval of 5 seconds; the duration of each game also follows a Poisson process and the mean time is 40 minutes. In both situations, we assume that there are enough servers and each has an initial resource usage 10, 5, 3096, 512, 0 (it is gathered from our real servers). Thus, we can start a new sever whenever needed. Moreover, from the aspect of resource-usage, we mainly focus on the number of used-serv- ers by each algorithm. For the first situation, we have compared our algorithms with three others: Size-based task assignment (STA) [35]: This algorithm is widely used in distributed systems, in which all tasks within a given size range of resource requirements are assigned to a particular server. Specific to our case, two types of servers (for CPU-critical and for memory-IO-critical respectively) are designated. Packing algorithm (PA): It is a greedy algorithm. For each server, it will be assigned as much games as possible till all the games have been dispatched. Dominant resource fairness (DRF) [36]: A fair sharing model that generalizes max-min fairness to multiple resource types. In our implementation, the collection of all currently-used servers (called small servers) is regarded as a big server. Whethe the big server can satisfy an incoming request or not just depends on if there exists such a small server. If not, a new small server will be added to enlarge the big. The scheduling strategy inside the big one is First-fit and all gaming requests are considered to be issued by dif- ferent users. We also estimate the ideal server-number for reference. For each kind of resources (denoted by s), the minimum number is P i¼1 P i¼1 n RRs i =RRs . Here n is the total number of game requests; Ri denotes the resource utilization of the i-th game and Rs is the corre- sponding resource capacity of a server. Thus, the maximum of all minimums is the ideal number. In the second situation our algorithms have been com- pared with the STA algorithm, because others require the information of the request sequence (which is unavailable in this case) and will become the FFL. Simulation results of Situation 1 are given in Fig. 5. The y-axis stands for the needed-server numbers (for clarity, TABLE 2 Resource-Requirements of Games Tuple Game type NFS 9.15%, 2.01%, 526, 220, MAX_SYSTEM_BANDWIDTH / 5 memory-io- critical Scrolls 14.55%, 7.02%, 795, 560, MAX_SYSTEM_BANDWIDTH / 3 memory-io- critical Combat 8.47%, 3.27%, 800, 296, MAX_SYSTEM_BANDWIDTH / 5 memory-io- critical Birds 10%, 1.1%, 181, 142, 6.54 CPU-critical Fig. 5. Server-numbers in Situation 1. 1246 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 9. values have been normalized) as several requests have arrived simultaneously (the request number is illustrated by the x-axis). We can figure out that, compared with others, the heuristic algorithms are quite good. Even considering the ideal number, our algorithms are really close to the opti- mal (the maximal value is 101.23 percent). Moreover, these two algorithms perform almost equal in all cases. Fig. 6 shows that the number of requested servers when requests arrived in sequence (Situation 2). We can figure out that our heuristic algorithms are more efficient than the STA. These two algorithms also perform simi- larly in all cases: compared with the BFL algorithm, the more consumed resources by the FFL are less than 3.6 percent (57:55). At last, results show the performance of FFL is about 20 percent faster than the BFL, while both are fast enough (in the batch processing mode, both can complete the task-assignment in several milliseconds as the request number is 1,000). 4.5.2 Tests with Real Game-Traces To further evaluate the proposed task-scheduling strategies, we conduct a trace-driven simulation for a large-scale clus- ter (some similar simulation method has been used in [37]); each server is the same as the one presented in Section 4.4. The dataset we used is the World of Warcraft history dataset provided by Yeng-Ting Chen et al. [38]. Although this data- set is based on the MMORPG of “World of Warcraft”, we think it is useful in our case because cloud gaming and MMORPG share many similarities, such as wide variations in the gaming time, a huge bandwidth-demand and a large number of concurrent users. Of course, necessary pre-proc- essing is introduced to make the dataset more suitable, namely, we have mapped the first four races in the dataset (Blood Elf, Orc, Tauren and Troll) to the four kinds of games in our system and the remaining one (the undead) is mapped to one of these four games randomly. In detail, we have used traces of three months that con- sist of 396,631 game-requests (details are shown in Table 3). Accordingly, a cluster of 200 servers has been simulated, in which the master node collects the resource utilization of all servers every one minute. Because previous tests have shown that BFL and FFL policies perform similarly, we have only tested the BFL scheduling policy here. Fig. 7 shows numbers of running game-instances, acti- vated servers and used servers (once it is used, a server will be regarded as a used server no matter whether it is being activated or not); there is an obvious linear relationship between the number of game-instances and the number of activated servers. What’s more, the average number of acti- vated servers is 64, which is significantly less than the maxi- mum number of used servers (152). It means that the scheduling efficiency is good; it also means server consoli- dation [37] can be used to further reduce the number of servers. Fig. 8 shows the average resource-utilizations of acti- vated servers of each day. Although the utilization rates of other resources are relatively low, the bandwidth’s is high. It proves that most games are memory-io-critical, which accords with our performance model. We have completed another simulation, in which the server number is infinite, to illustrate the relationship between the total of used servers and the update-interval for resource utilization. Fig. 9 shows the relationship; we can see that when the update-interval is less than 20 minutes, the number of used servers varies slightly. When the interval is larger, the num- ber has increased significantly. It means that we could use a longer update-interval and the impact on the system effi- ciency is very limited. It is also helpful to manage a large- scale cloud gaming system, because message exchanges between server-agents and the manager will be reduced apparently. Fig. 6. Server-numbers in Situation 2. TABLE 3 Details of the Dataset Parameter Value Simulated period 3 months Server number 200 Total game requests 396,631 Maximum game requests arriving simultaneously 227 Maximum game instances running simultaneously 757 Average lifetime of game instances 85 minutes Average interval between game requests 3 minutes Fig. 7. Running games and servers of each day. ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1247 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 10. 4.6 Discussions 4.6.1 Different Game Configurations and/or Heterogeneous Servers The above work is targeted to specific hardware and games and we believe the method is practical: it is reasonable to assume that any game should be tested fully before on-line; thus the resource requirements of each game can be mea- sured on the given server of which the hardware configura- tion will remain unchanged for a long period. If heterogeneous servers are used, as we have found that the host CPU or the memory bus is the system bottleneck, new servers’ capacities can also be derived, based on the comparison between the CPU performance and system bandwidth of reference servers and new servers (these met- rics may have been labeled by the producer or can be tested), which can avoid the exponentially-growing com- plexity of testing. Appendix B, available in the online supple- mental material, gives an example to show that the capability of a new server for known games is predictable and then summarizes the prediction method. For different game configurations, the situation is more complicated. Even if only the resolution is different, tests show that there is not an obvious relationship between the resolution and resource consumptions, although the con- sumption of our framework itself (like encoding and image- capture) is proportional to the resolution. Therefore, our solution is: during the real operation ser- vice period, such configurations can be evaluated on line first. For example, we can schedule the same game with same configurations to some dedicated server(s) if a user has demanded. With the accumulation of game-runs, the metrics will become more accurate. 4.6.2 Time-Dependent Factors We use average values to denote resource requirements of a given game. In reality, requirements are time-dependent, which may vary in different gaming stages. However, we believe average values are enough owing to the following facts: 1) The variety degree depends on the time granularity heavily. Our tests show that the degree becomes smaller with the increase of the time interval. When the time interval is 30s (in Appendix C, available in the online supplemental material), the variety of requirements is relatively small. 2) Consider resource consolidation of multiple concur- rently-running games, the usage of average values are reasonable. Moreover, it is necessary to note that some games will last very long time to finish. Thus in our experimental envi- ronment, it is difficult to explore plenty of scenes. However, such a game can be evaluated on line first for data accumu- lation (as we have mentioned above). 5 IMPLEMENTATION AND EVALUATION 5.1 Implementation We have implemented the cloud gaming system based on the user-level virtualization technology. Eight PC servers are connected by a Gigabit Ethernet; their configurations are the same as the previous one in Section 4.4. Detours [39] has been used to complete the required interception func- tions. In detail, we have implemented a DLL (called gamedll) that can be inserted into any gaming process to wrap all interesting APIs and to spawn two threads for input-recep- tion and data-encoding / -streaming respectively. Now our virtualization layer can stream Direct3D games, OpenGL games and flash games to Windows, iOS and Android clients, and receive remote operations. The UDT (UDP-based Data Transfer) protocol [40] is used to deliver the video / audio / operation data between the server and client. We use the periodical video capture as the timing-refer- ence on the server side; any audio data between two conse- cutive video-capture-timestamps will be delivered with the current video data. To be specific, Windows Audio Session APIs provide some interface to create and manage audio streams to and from audio devices. Our interception does replicate such stream buffers. After the current image has been captured, the audio data between the current read / write positions (read position is just the current playback position) of the buffer will be copied out immediately and sent out with the current image. This method completes video / audio synchronization and limits the timing discrepancy to the reciprocal of the FPS value or so. As mentioned in Section 4.1, an exception lies in that games may decrease the FPS deliberately in some scenes, which will cause more timing discrepancies. To remedy this Fig. 8. Resource-utilizations of activated servers. Fig. 9. Used servers of different update-intervals. 1248 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 11. situation, a dedicated timer has been introduced to trigger audio transmission only if the current interval of successive frames is longer than a threshold. Moreover, from the aspect of clients, to smooth the play- back of received audio, one extra audio-buffer will be man- aged by the cloud-gaming client software. Any received audio will be stored into this buffer first to be appended to the existing data (also in this buffer). As the whole buffer has been filled, all will be copied to the playback device. Thus, combined with the default buffer of the playback device, it constructs a double-buffering mechanism, which can parallelize the playback and reception and then smooth the playback. Therefore, any audio data will be delayed for some time: in our system, the length of this buffer is set to occupy audio-data of 200 ms, which will make the playback smooth. Results have been given in the next section. 5.2 Evaluation The test environment and configurations are the same as those in Section 4.4, as well as the testing method. 5.2.1 Overheads of the User-Level Virtualization Technology Itself We execute a game on a physical machine directly and record the game speed (in term of the average FPS) and average memory consumption. Then, this game is running in the user-level virtualization environment (all related APIs have been intercepted but no any real work, like image capture, encoding, etc., has been enabled) and in a virtual machine respectively; the same runtime information will be recorded repeatedly. The latest VMware Play 6 is employed and both the host / guest OSes are Win 7. The comparison is shown in Fig. 10 (for clarity, values have been normalized). Consider the GPU utilization, the user-level technology itself almost introduces no performance-loss, while the VM- based solution’s efficiency is a little lower, about 90 percent of the native. On the other side, the memory consumption of the VM-based solution is 2.4 times as many as the native, because the memory occupied by the guest OS is consider- able. For the user-level solution, this consumption is almost the same, too. 5.2.2 Processing Performance of the Server The processing procedure of a cloud-gaming instance can be divided into four parts: (1) image capture, which copies a rendered into the system memory, (2) video encoding, (3) transferring, which sends each compressed-frame into the network, and (4) the process of the game-logic-operation and rendering. The last one is mainly dependent on the con- crete game while GCloud handles the others. Thus the first three are object of this test and the amount of these delays is denoted as SD (Server Delay). Moreover, we intend to get the limit of the performance. Hence only one instance is running on a server and the “try the best” strategy is used. Namely, no Sleep call has been inserted; the games can run as fast as possible. Some exist- ing work [3] has completed the similar test for GamingAny- where and Onlive, so that we can compare results with theirs. Although the tested games of [3] are different, we believe the comparison is meaningful because the server delay is independent on specific games to a large extent. Fig. 11 reports the average SD of three video games under different resolutions. The corresponding FPS is in Fig. 10. Comparison of resource consumption. Fig. 11. Processing performance and the decomposition (three resolutions). ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1249 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 12. Fig. 12. The average value of 720P is given in Fig. 13, as well as the corresponding values of GamingAnywhere and OnLive (values have been normalized). Results show that, compared with similar solutions, GCloud achieves smaller SDs (ranging from 8 ms to 19 ms), which are positive correlated with resolutions. We think it is mainly attributed to the high encoding performance of Quick Sync. In contrast, the encoding delay of GamingAny- where is about 14$16 ms per frame. The transferring latency is smaller than others by two orders of magnitude. Even in following cases of multiple games, it does still hold. Thus, the transferring latency can be skipped, as we have proposed in Section 4. 5.2.3 Multiple Games The “just good enough” strategy is used; a Sleep call has been used to fix the FPS. First, an OpenGL game and three Direct3D games have been played one by one and the proc- essing delay (including the sleep time) is sampled periodi- cally; the sample period is one frame. Second, quite a few game combinations, each including more than one game, have been executed and sampled. Without loss of general- ity, FPS values of some game combinations that are played simultaneously are presented in Table 4, as well as the aver- age absolute deviations (AADs). These combinations are: Case 1: Two NFS instances; Case 2: One NFS, one Combat and one Scrolls; Case 3: Two NFS, one Combat and one Scrolls; Case 4: One NFS, one Combat, one Scrolls and two Birds. On the whole, the average FPS ranges from 30.5 to 31.5 as one game is running alone. Their average absolute devia- tions are 0.10 (Birds), 0.11 (NFS), 0.15 (Combat) and 1.47 (Scrolls) respectively, which means the FPS value is fairly stable. Of course, there are quite a few delay-fluctuations. It usually means the corresponding game-scenes are changing rapidly, which is the common case for highly-interactive games, especially for Scrolls. With the increment of the number of concurrently-run- ning games (it means more interferences between games), the FPS values decrease correspondingly while the average absolute deviations increase: For Scrolls, as three games running (Case 2) at the same time, its average FPS is 28.3 and the AAD is 2.13. For four instances (Case 3), the values are 27.8 and 2.98 respectively. For Combat, as three games running simultaneously, the average FPS is 29.2; the AAD is 0.89. For four, the values are 28.8 and 1.59 respectively. For the uncertainties of FPS values, we believe the main reason lies in two aspects: 1) There exists interferences among several running instances, including resource contests, which make resource-consumption not totally linear with the increase of instances (as illustrated in Fig. 4). For example, Scrolls consumes the most resources, thus its uncertainty is the biggest. 2) As mentioned in Section 4.6, resource require- ments of games are time-dependent, which may vary in different stages. It has also caused some uncertainties. Anyway, it means that the system can get satisfactory gaming-effect and the FPS can be made relatively stable, as multiple games are running simultaneously. 5.2.4 Verification of the Performance Model According to the result of the performance model and scheduling strategy, we test several typical server loads for verification. Without loss of generality, the following cases have been presented. 1) One Scrolls, one Combat and two NFS. As presented in Table 5 (1st row), the FPS value of each game is more than 27 and the lowest is Scrolls’s, about 27.1. All are not less than 90 percent of the FIXED_FPS (30), thus they are accepetable. Because the system-RAM band- width has been nearly exhausted (about 93 percent of the MAX_SYSTEM_BANDWIDTH), when another game join (regardless NFS or Birds), the FPS of Scrolls will drop below the acceptable level. Fig. 12. FPS of games. Fig. 13. Comparison of the processing delay (1280 Â 720; the lower the better). TABLE 4 FPS Values and Average Absolute Deviations of Different Numbers of the Running Games Game / Case 1 2 3 4 NFS FPS 30.2 30.3 30.2 30.2 AAD 0.18 0.24 0.44 0.70 Combat FPS N/A 29.2 28.8 28.6 AAD N/A 0.89 1.59 1.89 Scrolls FPS N/A 28.3 27.8 27.3 AAD N/A 2.13 2.98 3.30 Birds FPS N/A N/A N/A 29.8 AAD N/A N/A N/A 0.56 1250 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 13. 2) One Scrolls, one Combat, one NFS and three Birds. For this case, the sum of each kind of resource consump- tion is less than the corresponding system-capacity; the relative maximum is the sum of memory throughputs, about 95 percent of the MAX_SYS- TEM_THROUGHPUT.In Table 5 (second row), the FPS value of each game is more than 27. 3) One NFS, two Combat and five Birds. 4) Three NFS and five Birds. In Case 3 4, the sum of memory throughputs is about 96 percent of the MAX_SYSTEM_THROUGHPUT. Anyway, as the sum of each kind of resource consumption is less than the corresponding system-capacity, the FPS value of each game is still more than 27. 5.2.5 Discrepancy between Video and Audio We have designed a method to calculate this discrep- ancy: on the server, some sequences of full-black images are inserted into the video-stream to replace original scenes; at the same time, mute data will replace the cor- responding audio-data, too. On the client, a screen recording software is running with the gaming client. Thus, through the analysis of audio / video streams of recorded data, we can get time-stamps of the beginnings of inserted video / audio sequences respectively. Then, the discrepancies can be calculated. Results show that these values are in the range of 180 ms$410 ms (Table 6). We think the reason lies in the following, besides the preset delays aforementioned: 1) The delay-fluctuations of games. The corresponding FPS-values will be less than 30, which will increase the timing discrepancy, because the accumulation process of audio-data will be slowed. 2) The network’s delay-fluctuations. They will increase the timing discrepancy, too. Our tests are carried out in the campus. We believe, for the Internet, this fac- tor will cause more delays. 3) The measurement error. The recording software records the screen periodically, 30 FPS, while the audio recording is consecutive. Thus, beginnings of some sequences of full-black images may be lost, which will decrease the gap. 6 CONCLUSIONS AND FUTURE WORK This paper proposes GCloud, a GPU/CPU hybrid cluster for cloud gaming based on the user-level virtualization technology. We focus on the guideline of task scheduling: To balance the gaming-responsiveness and costs, we fix the game’s FPS to allocate just enough resources, which can also mitigate the inference between games. Accordingly, a per- formance model has been analyzed to explore the server- capacity and the game-demands on resource, which can locate the performance bottleneck and guide the task-sched- uling based on games’ critical resource-demands. Compari- sons show that both the First-Fit-like and Best-Fit-like scheduling strategies can outperform others. Moreover, they are near optimal in the batch processing mode. In the future, we plan to enhance performance models to support heterogeneous servers. ACKNOWLEDGMENTS The work is supported by the High Tech. RD Program of China under Grant No. 2013AA01A215. REFERENCES [1] R. Shea, L. Jiangchuan, E.C.-H. Ngai, and C. Yong, “Cloud gam- ing: Architecture and performance,” IEEE Netw., vol. 27, no. 4, pp. 16–21, Jul./Aug. 2013. [2] Z. Zhao, K. Hwang, and J. Villeta, “GamePipe: A virtualized cloud platform design and performance evaluation,” in Proc. ACM 3rd Workshop Sci. Cloud Comput., 2012, pp. 1–8. [3] C.-Y. Huang, C.-H. Hsu, Y.-C. Chang, and K.-T. Chen, “GamingAnywhere: An open cloud gaming system,” in Proc. ACM Multimedia Syst., Feb. 2013, pp. 36–47. [4] R. Phull, C.-H. Li, K. Rao, S. Cadambi, and S. T. Chakradhar, “Interference-driven resource management for GPU-based het- erogeneous clusters,” in Proc. 21st ACM Int. Symp. High Perform. Distrib. Comput., 2012, pp. 109–120. [5] V. T. Ravi, M. Becchi, G. Agrawal, and S. T. Chakradhar, “Supporting GPU sharing in cloud environments with a transpar- ent runtime consolidation framework,” in Proc. 20th ACM Int. Symp. High Perform. Distrib. Comput., 2011, pp. 217–228. [6] G. A. Elliott and J. H. Anderson, “Globally scheduled real-time multiprocessor systems with GPUs,” Real-Time Syst., vol. 48, no. 1. pp. 34–74, 2012. [7] L. Chen, O. Villa, S. Krishnamoorthy, and G. R. Gao, “Dynamic load balancing on single- and multi-gpu systems,” in Proc. IEEE Int. Symp. Parallel Distrib. Process., 2010, pp. 1–12. [8] M. Yu, C. Zhang, Z. Qi, J. Yao, Y. Wang, and H. Guan, “GRIS: Virtualized GPU resource isolation and scheduling in cloud gaming,” in Proc. 22nd Int. Symp. High-Perform. Parallel Distrib. Comput., 2012, pp. 203–214. [9] C. Zhang, J. Yao, Z. Qi, M. Yu, and H. Guan, “vGASA: Adaptive scheduling algorithm of virtualized GPU resource in cloud gaming,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 11, pp. 3036–3045, 2014. [10] M. Claypool and K. Claypool, “Latency and player actions in online games,” Commun. ACM, vol. 49, no. 11, pp. 40–45, 2006. [11] D. C. Barboza, V. E. F. Rebello, E. W. G. Clua, and H. Lima, “A simple architecture for digital games on demand using low per- formance resources under a cloud computing paradigm,” in Proc. Brazilian Symp., Games Digital Entertainment, 2010, pp. 33–39. [12] D. De Winter, P. Simoens, and L. Deboosere, “A hybrid thin-client protocol for multimedia streaming and interactive gaming applications,” in Proc. Int. Workshop Netw. Oper. Syst. Support Digi- tal Audio Video, 2006, p. 15. TABLE 5 FPS of Concurrently-Running Games TABLE 6 Discrepancy Values on the Client Side Minimum Maximum Average NFS 205 ms 395 ms 287 ms Scrolls 213 ms 410 ms 323 ms Combat 196 ms 336 ms 278 ms Birds 180 ms 275 ms 242 ms ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1251 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 14. [13] W. Yu, J. Li, C. Hu, and L. Zhong, “Muse: A multimedia streaming enabled remote interactivity system for mobile devices,” in Proc. 10th Int. Conf. Mobile Ubiquitous Multimedia, 2011, pp. 216–225. [14] L. Shi, H. Chen, and J. Sun, “vCUDA: GPU accelerated high per- formance computing in virtual machines,” in Proc. IEEE Int. Symp. Parallel Distrib. Process., 2009, pp. 1–11. [15] J. Duato, A. J. Pena, F. Silla, R. Mayo, and E. S. Quintana- Ortı, “rCUDA: Reducing the number of GPU-based accelerators in high performance clusters,” in Proc. Int. Conf. High Perform. Com- put. Simul., 2010, pp. 224–231. [16] V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan, “GViM: Gpu-accelerated virtual machines,” in Proc. ACM Workshop Syst.-Level Virtualization High Perform. Comput., 2009, pp. 17–24. [17] M. Bautin, A. Dwarakinath, and T. cker Chiueh, “Graphic engine resource management,” in Proc. 15th Multimedia Comput. Netw., 2008, pp. 15–21. [18] D. Wu Z. Xue, and J. He “iCloudAccess: Cost-effective streaming of video games from the cloud with low latency,” IEEE Trans. Circuits Syst. Video Technol., vol. 24, no. 8, pp. 1405–1416, Jan. 2014. [19] H.-J. Hong, D.-Y. Chen, C.-Y. Huang, K.-T. Chen, and C.-H. Hsu, “Placing virtual machines to optimize cloud gaming experience,” IEEE Trans. Cloud Comput. , vol. 3, no. 1, pp. 42–53, Jan.–Mar. 2015. [20] S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa, “TimeGraph: GPU scheduling for real-time multi-tasking environ- ments,” in Proc. USENIX Conf. USENIX Annu. Tech. Conf., 2011, p. 2. [21] L Cherkasova and L Staley, “Building a performance model of streaming media application in utility data center environment,” in Proc. 3rd IEEE/ACM Int. Symp. Cluster Comput. Grid, 2003, pp. 52–59. [22] V. Ishakian and A. Bestavros, “MORPHOSYS: Efficient colocation of QoS-constrainedworkloads in the cloud,” in Proc. 12th IEEE/ ACM Int. Symp. Cluster, Cloud Grid Comput., 2012, pp. 90–97. [23] S. Wang and S. Dey, “Rendering adaptation to address communi- cation and computation constraints in cloud mobile gaming,” in Proc. Global Telecommun. Conf., Dec. 6–10, 2010, pp. 1–6. [24] D. Klionsky. A new architecture for cloud rendering and amor- tized graphics. M.S. Thesis, School Comput. Sci., Carnegie Mellon Univ., CMU-CS-11–122. [Online]. Available: http://reports- archive.adm.cs.cmu.edu/anon/2011/abstracts/11–122.html. [25] A. Jurgelionis, P. Fechteler, P. Eisert, F. Bellotti, and H. David, “Platform for distributed 3D gaming,” Int. J. Comput. Games Tech- nol. , vol. 2009, p. 1, 2009. [26] A. Ojala and P. Tyrvainen, “Developing cloud business models: A case study on cloud gaming,” IEEE Softw., vol. 28, no. 4, pp. 42–47, Jul. 2011. [27] S.-W. Chen, Y.-C. Chang, and P.-H. Tseng, C.-Y. Huang, and C.-L. Lei, “Measuring the latency of cloud gaming systems,” in Proc. 19th ACM Int. Conf. Multimedia, 2011, pp. 1269–1272. [28] S. Choy, B. Wong, G. Simon, and C. Rosenberg “The brewing storm in cloud gaming: A measurement study on cloud to end- user latency,” in Proc. 11th Annu. Workshop Netw. Syst. Support Games, 2012, p. 2. [29] Y.-T. Lee, K.-T. Chen, H.-I. Su, and C.-L. Lei, “Are all games equally cloud-gaming-friendly? An electromyographic approach,” in Proc. IEEE/ACM NetGames, 2012, pp. 109–120. [30] K.-T. Chen, Y.-C. Chang, H.-J. Hsu, D.-Y. Chen, C.-Y. Huang, and C.-H. Hsu, “On the quality of service of cloud gaming systems,” IEEE Trans. Multimedia, vol. 16, no. 2, pp. 480–495, Feb. 2014. [31] Y. Zhang, X. Wang, and L. Hong, “Portable desktop applications based on P2P transportation and virtualization,” in Proc. 22nd Large Installation Syst. Administration Conf., 2008, pp. 133–144. [32] P. Guo, “CDE: Run any linux application on-demand without installation,” in Proc. 25th USENIX Large Installation Syst. Adminis- tration Conf., 2011, p. 2. [33] B. Xia and T. Zhiyi, “Tighter bounds of the first fit algorithm for the bin-packing problem,” Discrete Appl. Math., vol. 158, no. 15, pp. 1668–1675, 2010. [34] C. Kenyon, “Best-fit bin-packing with random order,” in Proc. 7th Annu. ACM-SIAM Symp. Discrete Algorithm, 1996, vol. 96, pp. 359–364. [35] M. Harchol-Balter, M. E. Crovella, and C. Duarte Murta, “On Choosing a task assignment policy for a distributed server sys- tem,” J. Parallel Distrib. Comput., vol. 59, no. 2, pp. 204–228, 1999. [36] A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, “Dominant resource fairness: Fair allocation of mul- tiple resource types,” in Proc. 8th USENIX Symp. Netw. Syst. Des. Implementation, 2011, pp. 323–336. [37] Y.-T. Lee and K.-T. Chen, “Is server consolidation beneficial to MMORPG? A case study of world of warcraft,” in Proc. IEEE 3rd Int. Conf. Cloud Comput., 2013, pp. 435–442. [38] Y.-T. Lee, K.-T. Chen, Y.-M. Cheng, and C.-L. Lei, “World of war- craft avatar history dataset,” in Proc. 2nd Anuu. ACM Multimedia Syst., Feb. 2011, pp. 123–128. [39] G. Hunt and D. Brubacher, “Detours: Binary interception of Win32 functions,” in Proc. 3rd USENIX Windows NT Symp., Jul. 1999, p. 14. [40] Y. Gu and R. L. Grossman, “UDT: UDP-based data transfer for high-speed wide area networks,” Comput. Netw., vol. 51, no. 7, pp. 109–120, May 2007. Youhui Zhang received the BSc and PhD degrees in computer science from Tsinghua Uni- versity, China, in 1998 and 2002. He is currently a professor in the Department of Computer Sci- ence, Tsinghua University. His research interests include computer architecture, cloud computing, and high-performance computing. He is a mem- ber of the IEEE and the IEEE Computer Society. Peng Qu received the BSc degree in computer science from Tsinghua University, China, in 2013. He is currently working toward the PhD degree in the Department of Computer Science, University of Tsinghua, China. His interests include cloud computing and micro-architecture. Cihang Jiang received the BSc degree in com- puter science from Tsinghua University, China, in 2013. He is currently a master student in the Department of Computer Science, University of Tsinghua, China. His research interest is cloud computing. Weimin Zheng received the BSc and MSc degrees in computer science from Tsinghua Uni- versity, China, in 1970 and 1982, respectively. He is currently a professor in the Department of Computer Science, University of Tsinghua, China. His research interests include high perfor- mance computing, network storage and distrib- uted computing. He is a member of the IEEE and the IEEE Computer Society. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib. 1252 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016 www.redpel.com +917620593389 www.redpel.com +917620593389