08448380779 Call Girls In Civil Lines Women Seeking Men
Conference Paper: Cross-platform estimation of Network Function Performance
1. Cross-Platform Estimation of Network Function
Performance
Amedeo Sapio
Department of Control and
Computer Engineering
Politecnico di Torino
Torino, Italy
amedeo.sapio@polito.it
Mario Baldi
Department of Control and
Computer Engineering
Politecnico di Torino
Torino, Italy
mario.baldi@polito.it
Gergely Pongr´acz
TrafficLab
Ericsson Research
Budapest, Hungary
gergely.pongracz@ericsson.com
Abstract—This work shows how the performance of a network
function can be estimated with an error margin that is small
enough to properly support orchestration of network functions
virtualization (NFV) platforms. Being able to estimate the per-
formance of a virtualized network function (VNF) on execution
hardware of various types enables its optimal placement, while
efficiently utilizing available resources. Network functions are
modeled using a methodology focused on the identification of
recurring execution patterns and aimed at providing a platform
independent representation. By mapping the model on specific
hardware, the performance of the network function can be
estimated in terms of maximum throughput that the network
function can achieve on the specific execution platform. The
approach is such that once the basic modeling building blocks
have been mapped, the estimate can be computed automatically.
This work presents the model of an Ethernet switch and evaluates
its accuracy by comparing the performance estimation it provides
with experimental results.
Keywords—Network Functions Virtualization; Virtual Network
Function; modeling; orchestration; performance estimation;
I. INTRODUCTION
For a few years now software network appliances have been
increasingly deployed. Initially, their appeal stemmed from
their lower cost, shorter time-to-market, ease of upgrade when
compared to purposely designed hardware devices. These
features are particularly advantageous in the case of appliances,
a.k.a. middleboxes, operating on relatively recent higher layer
protocols that are usually more complex and are possibly still
evolving. Then, with the overwhelming success and diffusion
of cloud computing and virtualization, software appliances
became natural means to ensure that network functionalities
had the same flexibility and mobility as the virtual machines
(VMs) they offer services to. Hence, value started to be seen
in the software implementation of also less complex, more
stable network functionalities. This trend led to embracing
Software Defined Networking and Network Functions Virtu-
alization (NFV). The former as a hybrid hardware/software
approach to ensure high performance for lower layer packet
forwarding, while retaining a high degree of flexibility and
programmability. The latter as a virtualization solution target-
ing the execution of software network functions in isolated
Virtual Machines (VMs) sharing a pool of hosts, rather than
on dedicated hardware (i.e., appliances). Such a solution en-
ables virtual network appliances (i.e., VMs executing network
functions) to be provisioned, allocated a different amount of
resources, and possibly moved across data centers in little time,
which is key in ensuring that the network can keep up with
the flexibility in the provisioning and deployment of virtual
hosts in today’s virtualized data centers. Additional flexibility
is offered when coupling NFV with SDN as network traffic can
be steered through a chain of Virtualized Network Functions
(VNFs) in order to provide aggregated services. With inputs
from the industry, the NFV approach has been standardized by
the European Telecommunications Standards Institute (ETSI)
in 2013 [1].
The flexibility provided by NFV requires the ability to
effectively assign compute nodes to VNFs and allocate the
most appropriate amount of resources, such as CPU quota,
RAM, virtual interfaces. In the ETSI standard the component
in charge of taking such decisions is called orchestrator and it
can also dynamically modify the amount of resources assigned
to a running VNF when needed. The orchestrator can also
request the migration of a VNF when the current compute node
executing it is no longer capable of fulfilling the VNF perfor-
mance requirements. These tasks require the orchestrator to
be able to estimate the performance of VNFs according to the
amount of resources they can use. Such estimation must take
into account the nature of the traffic manipulation performed
by the VNF at hand, some specifics of its implementation, and
the expected amount of traffic it operates on. A good estimation
is key in ensuring higher resource usage efficiency and avoid
adjustments at runtime.
This work presents and evaluates the model of an Ethernet
switch based on a unified modeling approach [2] applicable
to any VNF, independently of the platform it is running on.
By mapping the VNF model to a specific hardware, it is
possible to predict the maximum amount of traffic that the
VNF can sustain. In this work, the model is mapped to a
sample hardware platform and the predicted performance is
compared with the actual measurements.
The deployed modeling approach [2] is particularly valu-
able because it relies on a description of VNFs in terms of
basic operations, which results in a hardware independent
notation that ensures that the model is valid for any execution
platform. In addition, the mapping of the model on a target
hardware architecture (required in order to determine the actual
performance) can be automated, hence allowing to easily apply
2. the approach to each available hardware platform and choose
the most suitable for the execution.
After discussing related work in Section II, the modeling
approach is described in Section III. Section IV presents
the modelization of an Ethernet switch and the mapping of
the model to a general purpose hardware architecture. In
order to validate the accuracy of the approach, Section V
compares the performance estimated through the model with
actual measurements obtained by running targeted experiments
with a software implementation of the Ethernet switch on the
considered hardware platform.
II. RELATED WORK
This work applies to an Ethernet switch the approach
to network function modelization proposed in [2], providing
experimental measurements to validate the obtained model.
The modelization approach was inspired by [3] that aims
to demonstrate that the Software Defined Networks approach
does not necessarily imply lower performance compared to
purpose-built ASICs. In order to prove it, the performance of
a software implementation of an Ethernet Provider Backbone
Edge Bridge is evaluated. The execution platform considered
in this work is a hypothetical network processor, for which
a high-level model is provided. The authors do not aim at
providing a universal modelization approach for a generic
network functions. Rather, their purpose is to use a specific
sample network function to demonstrate that, even for very
specific tasks, the NPU-based software implementation offers
performance only slightly lower than purpose designed chips.
A modeling approach for describing packet processing in
middleboxes and the ways they can be deployed is presented
in [4] and applied to a NAT, a L4 load balancer, and a L7
load balancer. The proposed model is not aimed at estimating
performance and resources requirements, but it rather focuses
on accurately describing middleboxes functionalities to support
decisions in their deployment.
On the other hand, a VNF modeling approach aimed at
performance estimation would be greatly beneficial to cloud
platforms where the performance of the network infrastructure
is taken into account when placing VMs [5]–[7]. For example,
[7] describes the changes needed in the OpenStack software
platform, the open-source reference cloud management system,
to enable the Nova scheduler to plan VM allocation based on
network property data and a set of constraints provided by the
orchestrator. We argue that in order to infer such constraints,
the orchestrator needs a VNF model like the ones generated
by the approach presented in this paper.
III. METHODOLOGY
The proposed modeling approach is based on the definition
of a set of processing steps, here called Elementary Operations
(EOs), that are common throughout various NF implementa-
tions. This stems from the observation that, generally, most
NFs perform a rather small set of operations when processing
the average packet, namely, a well-defined alteration of packet
headers, coupled with a data structure lookup.
An EO is informally defined as the longest sequence of
elementary steps (e.g., CPU instructions or ASIC transactions)
that is common among multiple NFs processing tasks. As a
consequence, an EO have variable granularity ranging from a
simple I/O or memory load operation, to a whole IP checksum
computation. On the other hand, EOs are defined so that each
can be potentially used in multiple NF models.
An NF is modeled as a sequence of EOs that represent the
actions performed for the vast majority of packets. Since we
are interested in performance estimation, we ignore handling
that affects only a small number of packets (i.e., less the 1%),
since these tasks have a negligible impact on performance,
even when they are more complex and resource intensive
than the most common ones. Accordingly exceptions, such as
failures, configuration changes, etc., are not considered.
It is important to highlight that NF models produced with
this approach are hardware independent, which ensures that
they can be applied when NFs are deployed on different
execution platforms. In order to estimate the performance of an
NF on a specific hardware platform, each EO must be mapped
on the hardware components involved in its execution and
their features. This mapping allows to take into consideration
the limits of the involved hardware components and gather
a set of constraints that affect the performance (e.g., clock
frequency). Moreover, the load incurred by each component
when executing each EO must be estimated, whether through
actual experiments or based on nominal hardware specifica-
tions. The data collected during such mapping are specific to
EOs and the hardware platform, but not to a particular NF.
Hence, they can be applied to estimate the performance of any
NF starting from its model. Specifically, the performance of
each individual EO involved in the NF model is computed and
composed considering the cumulative load that all EOs impose
on the hardware components of the execution platform, while
heeding all of the applicable constraints.
Figure 1 summarizes the steps and intermediate outputs of
the proposed approach.
NF
EO
HW
Architecture
NF
Performance
Express
Map
EO Performance Constraints
NF Model
Fig. 1: NF modeling and performance estimation approach.
Table I presents a sample list of EOs that we identified
when modeling a number of NFs. Such list is by no means
meant to be exhaustive; rather, it should be incrementally
extended whenever it turns out that a new NF being considered
cannot be described with previously identified EOs. When
defining an EO, it is important to identify the parameters
related to traffic characteristics that significantly affect the
execution and resource consumption.
3. TABLE I: Sample list of EOs
EO Parameters Description
1 mem_I/O L1n, L2n
Packet copy between
I/O and (cache) memory
2 parse b Parsing a data field
3 increase b
Increase/decrease
a field
4 array_access es, max
Direct access to
a byte array in memory
5 hash_lookup
N, HE,
max, p
Simple hash table lookup
6 checksum b Compute IP checksum
7 sum b Sum 2 operands
A succinct description of the EOs listed in table I is
provided below.
1) Packet copy between I/O and memory:
A packet is copied from/to an I/O buffer to/from
memory. L1n is the number of bytes that are prefer-
ably stored in L1 cache memory, otherwise in L2
cache or external RAM. L2n bytes are preferably
stored in L2 cache memory, otherwise in external
RAM. The parameters have been chosen taking into
consideration that some NPUs provide a manual
cache that can be explicitly loaded with the data
that need fast access. General purpose CPUs may
have assembler instructions (e.g., PREFETCHh) to
explicitly influence the cache logic.
2) Parsing a data field:
A data field of b bytes stored in memory is parsed.
A parsing operation is necessary before performing
any computation on a field (corresponds to loading
a processor register). This EO can be used also to
model the dual operation, i.e., encapsulation, which
implies storing back into memory a properly con-
structed sequence of fields.
3) Increase/decrease a field:
Increase/decrease the numerical value contained in
a field of b bytes. The field to increase must have
already been parsed.
4) Direct access to a byte array in memory:
This EO performs a direct access to an element of
an array in memory using an index. Each array entry
has size es, while the array has at most max entries.
5) Simple hash table lookup:
A simple lookup in a direct, XOR based hash table is
performed. The hash key consists of N components
and each entry has size equal to HE. The table has
at most max entries. The collision probability is p.
6) Compute IP checksum:
The standard IP checksum computation is performed
on b bytes.
7) Sum 2 operands:
Two operands of b bytes are added.
For the sake of simplicity (and without affecting the
validity of the approach, as shown by the results in Section V),
in modeling NFs by means of EOs, we assume that the number
of processor registers is larger than the number of packet fields
that must be processed simultaneously. Therefore there is no
competition for this resource.
IV. A MODELING USE CASE
This section demonstrates the application of the modeling
approach described in the previous section. EOs are used to
describe the operation of an Ethernet switch and then they are
mapped to a general purpose hardware platform.
A. Ethernet Switch Model
For each packet the switch selects the output interface
where it must be forwarded, retrieving it from a hash table
keyed by the destination MAC address extracted from the
packet.
When the network interface receives a packet, it is firstly
stored in an I/O buffer. In order to access the Ethernet header,
the CPU/NPU must first copy the packet in cache or main
memory. Since the switch operates only on the Ethernet header
that is of limited size (14 bytes), it is copied in the L1 cache,
while the rest of the packet (up to 1486 bytes) can be copied in
L2 cache or main memory. To ensure generality, we consider
that an incoming packet cannot be copied directly from an I/O
buffer to another, instead it must be first copied in (cache)
memory in any case.
The switch must then read the destination MAC address
(6 bytes) prior to using it to access the hash table to get the
appropriate output interface. The hash table has one key (the
destination MAC) and consists of 12 byte entries composed
by the key and the output interface MAC address.
Here we considered that the output interface is identified
by its Ethernet address. Different implementations can use a
different identifier, which leads to a minor variation in the
model.
The average number of entries in a real case scenario is
≈ 2M, which can give an idea of whether it can be fully
stored in cache under any traffic conditions. Here we assume
that the collision probability is negligible (i.e., the hash table
is sufficiently sparse).
The packet can then be moved to the buffer of the selected
output I/O device. The resulting model is summarized in
Figure 2.
mem_I/O(14, 1486)
parse(6)
hash_lookup(1, 12, 2M, 0)
mem_I/O(14, 1486)
Fig. 2: Ethernet switch model.
B. Mapping to Hardware
We now proceed to map the described EOs to a specific
hardware platform. Figure 3 provides a schematic representa-
tion of the platform main components and relative constraints
using the template proposed in [3]: an Intel R
Xeon E5-2630
CPU, a DDR3 RAM module and a 10Gb Ethernet Controller.
Using the CPU reference manual [8], it is possible to
determine the operations required for the execution of each
EO in Table I and estimate the achievable performance.
4. DDR3
- 1333 Mtps
- Max 85.2 Gbps
- CAS lat. 9
MCT
- 4 ch.
- DDR3
- Max
340.8 Gbps
I/O
PCIe v3.0
- 8 Gtps
- 126 Gbps
(x16)
x86-64
6 cores /slot
- 2 threads / core
- 2.3 – 2.8 GHz
AVX
VT-d, VT-x + EPT
L1
- per core
- i=32KB
- d=32KB
L2
- per core
- 256 KB
L3
- per slot
- 15 MB
2x 10 GbE
- 5 Gtps
- PCIe v2.0 (x8)
- Max 32 Gbps
Intel Xeon E5-2630
Fig. 3: Hardware architecture description.
1. mem_I/O(L1n, L2n)
The CPU L1 and L2 data caches can move one line per
cache cycle, i.e., 512 bits (64 bytes) in 4 clock cycles and
12 clock cycles respectively, and their maximum sizes are
32 KB and 256 KB, respectively. Moreover, read and write
operations in I/O buffers require on average 40 clock cycles.
On the whole, the execution of this EO requires:
4 ∗ ⌈
min(32KB, L1n)
64B
⌉+
12 ∗ ⌈
min(256KB, max(0, L1n − 32KB) + L2n)
64B
⌉+
40 ∗ ⌈
L1n + L2n
64B
⌉
clock cycles and
⌈
max(0, max(0, L1n − 32KB) + L2n − 256KB)
64B
⌉
L3 cache or DRAM accesses.
2. parse(b)
Loading a 64 bit register requires 4 clock cycles if data is
in L1 cache or 12 clock cycles if data is in L2 cache, otherwise
an additional L3 cache or DRAM memory access is required
to retrieve a 64 byte line and store it in L1 or L2 respectively:
4 ∗ ⌈
b
8B
⌉ clock cycles {+⌈
b
64B
⌉ L3 or DRAM accesses}
or
12 ∗ ⌈
b
8B
⌉ clock cycles {+⌈
b
64B
⌉ L3 or DRAM accesses}
3. increase(b)
Whether a processor includes an increase instruction or one
for adding a constant value to a 64 bit register, this EO requires
1 clock cycle to complete. However, thanks to pipelining, up
to 3 independent such instructions can be executed during 1
clock cycle:
⌈0.33 ∗
b
8B
⌉ clock cycles
4. array_access(es, max)
Direct array access needs to execute an “ADD” instruction
(1 clock cycle) for computing the index and a “LOAD” instruc-
tion resulting into a direct memory access and as many clock
cycles as the number of CPU registers required to load the
selected array element:
1 + ⌈
es
8B
⌉ clock cycles
+⌈
es
64B
⌉ DRAM accesses
5. hash_lookup(N, HE, max, p)
We assume that a simple hash lookup is implemented
according to the pseudo-code described in [3] and shown in
Figure 4 for ease of reference.
Register $1-N: key components
Register $HL: hash length
Register $HP: hash array pointer
Register $HE: hash entry size
Register $Z: result
Pseudo code:
# hash key calculation
eor $tmp, $tmp
for i in 1 ... N
eor $tmp, $i
# key is available in $tmp
# calculate hash index from key
udiv $tmp2, $tmp, $HL
mls $tmp2, $tmp2, $HL, $tmp
# index is available in $tmp2
# index -> hash entry pointer
mul $tmp, $tmp2, $HE
add $tmp, $HP
# entry pointer available in $tmp
<prefetch entry to L1 memory>
# pointer to L1 entry -> $tmp2
# hash key check (entry vs. key)
for i in 1 ... N
ldr $Z, [$tmp2], #4
# check keys
cmp $i, $Z
bne collision
# no jump means matching keys
# pointer to data available in $Z
Fig. 4: Hash lookup pseudo-code.
Considering that the hash entry needs to be loaded from
memory to L1 cache, a simple hash lookup would require
approximately:
⌈(4 ∗ N + 106 + 4 ∗ ⌈
HE
8B
⌉ + 4 ∗ ⌈
HE
32B
⌉) ∗ (1 + p)⌉
clock cycles and
5. ⌈(⌈
HE
64B
⌉ ∗ (1 + p))⌉
DRAM accesses.
Otherwise, if the entry is already in the cache, the memory
accesses and cache store operations are not required. Notice
that in order for the whole table to be in cache, its size should
be limited to:
max ∗ HE ≤ 32KB + 256KB = 288KB
So, in the average case, a mix of cache hits and misses will
take place, depending on the specific traffic profile.
6. checksum(b)
Figure 5 shows a sample assembly code to compute a
checksum on an Intel R
x86-64 processor. Assuming that the
data on which the checksum is computed is not in L1/L2 cache,
according to the Intel R
documentation [8], the execution of
this code requires
7 ∗ ⌈
b
2
⌉ + 8 clock cycles
+⌈
b
64B
⌉ L3 or DRAM accesses
Register ECX: number of bytes b
Register EDX: pointer to the buffer
Register EBX: checksum
CHECKSUM_LOOP:
XOR EAX, EAX ;EAX=0
MOV AX, WORD PTR [EDX] ;AX <- next word
ADD EBX, EAX ;add to checksum
SUB ECX, 2 ;update number of bytes
ADD EDX, 2 ;update buffer
CMP ECX, 1 ;check if ended
JG CKSUM_LOOP
MOV EAX, EBX ;EAX=EBX=checksum
;EAX=checksum>>16 EAX is the carry
SHR EAX, 16
AND EBX, 0xffff ;EBX=checksum&0xffff
;EAX=(checksum>>16)+(checksum&0xffff)
ADD EAX, EBX
MOV EBX, EAX ;EBX=checksum
SHR EBX, 16 ;EBX=checksum>>16
ADD EAX, EBX ;checksum+=(checksum>>16)
MOV checksum, EAX ;checksum=EAX
Fig. 5: Sample Intel R
x86 assembly code for checksum
computation.
7. sum(b)
On the considered architecture, the execution of this EO is
equivalent to the increase(b) EO. Please note that this is
not necessarily the case on every architecture.
TABLE II: Estimates for different packet sizes
Packet size Mpps Gbps
64 12.05 7.91
128 8.38 9.69
256 5.21 11.24
512 2.97 12.34
1024 1.59 13.01
1500 1.09 12.95
C. Performance Estimation
Using the above mapping of EOs in the Ethernet switch
model devised in Section IV-A and shown in Figure 2, we
can estimate that forwarding a packet of the maximum size
(1500 bytes) requires:
2630 clock cycles + 1 DRAM access
As a consequence, a single core of an Intel R
Xeon E5-
2630 operating at 2.8 Ghz can process ≈ 1.09 Mpps, while
the DDR3 memory can support 70.16 Mpps. The memory
throughput is estimated considering that each packet requires
a 12 byte memory access to read the hash table entry and the
time to read the second 8 bytes word from memory is:
(CAS latency ∗ 2) + 1
data rate
As a result a single core can process ≈ 12.95 Gbps.
If we consider minimum size (64 byte) packets (i.e., an
unrealistic, worst case scenario), the Ethernet switch requires:
238 clock cycles + 1 DRAM access
which means that a single core at 2.8 Ghz can process ≈
12.05 Mpps (while the load and throughput of the memory
remain the same), which translates into ≈ 7.9 Gbps. Estimates
calculated for different packet sizes are reported in Table II.
V. EXPERIMENTAL VALIDATION
In order to evaluate the accuracy of the estimates provided
by the proposed modeling approach, in this section we show
measurements made in a lab setting with software switch
implementations running on the presented hardware platform.
Three software switches are used in the experiments: Open
vSwitch (OVS), eXtensible DataPath Daemon (xDPd) and
Ericsson Research Flow Switch (ERFS). These switches are
configured via the OpenFlow protocol to perform a single des-
tination MAC address-based output port selection and forward
packets on the selected interface. The execution platform is
equipped with two Xeon E5-2630 processors whose model
is provided in Figure 3. To minimize the interference of the
operating system drivers, the network interfaces are managed
by the Intel R
DPDK drivers. These drivers are designed for
fast packet processing enabling applications (i.e., the switch
implementation in this case) to receive and send packets
directly from/to a network interface card within the minimum
possible number of CPU cycles. A separate PC with the same
hardware configuration is used as a traffic generator with
the DPDK based pktgen traffic generator that is capable of
saturating a 10GbE link with minimum size packets.
6. 0
2
4
6
8
10
12
14
16
64128 256 512 1024 1500
Throughput(Mpps)
Packet size (bytes)
Estimate
OVS-DPDK
ERFS
xDPd-0.6
Fig. 6: Performance with 100 flows
The test traffic consists of Ethernet packets with different
destination MAC addresses in order to prevent inter-packet
caching. The total number of packets sent for each test is equal
to 100, 000, 000 ∗ transmission rate (in Gbps). The generator
PC is also used to compute statistics on received packets.
Figure 6 shows the results obtained using each of the
above listed switches and generating 100 concurrent flows with
different destination MAC addresses. From the results it is
clear that in this scenario the switches can achieve throughput
up to the link capacity except with very small packets. The
estimated value is above the measured value, as expected, since
the estimation considers the hardware computational capability
and not the transmission rate of the physical links. For small
packets the fully-optimized pipeline of ERFS outperforms
xDPd and OVS. With 64 byte packets the measured throughput
of ERFS significantly exceeds the estimated value, which in
turn is above the measurements ones for the other 2 switches.
In order to further test the accuracy of the estimates, we
run additional tests with bi-directional flows. The generated
traffic has the same characteristics as the previous tests and in
this case we calculate the aggregate statistics on all output
interfaces. In this way the traffic processed by the switch
can hypothetically reach 20 Gbps. We test this configuration
with increasing packet sizes, until the link capacity is reached.
The results obtained, which involved 2 different cores, are
presented in Figure 7, together with the values estimated with
the modeling approach. As correctly estimated, a rate around
only 22 Mpps can be reached with small packets. As it is
visible, version 0.6 of xDPd has internal scalability problems,
while the other 2 switches are capable to scale as needed. The
above results show that the model provides a good estimation
of the throughput limit. In the case of bi-directional flows
the computed estimation has a 9% error for 64 byte packets,
0.2% for 128 byte packets and 6% for 256 byte packets. The
error increases for bigger packets because the computational
capabilities, which are what the model takes into account, are
no longer the factor limiting performance.
The results show that the proposed modeling approach
provides means to produce a valuable estimation of network
functions performance. This methodology will be further im-
0
5
10
15
20
25
64128 256 512 1024 1500
Throughput(Mpps)
Packet size (bytes)
Estimate
OVS-DPDK
ERFS
xDPd-0.6
Fig. 7: Performance with 100 flows and bi-directional traffic
(using 2 cores)
proved considering also the effects of packets interaction and
concurrence.
ACKNOWLEDGMENT
This work was conducted within the framework of the FP7
UNIFY project1
, which is partially funded by the Commission
of the European Union. Study sponsors had no role in writing
this report. The views expressed do not necessarily represent
the views of the authors’ employers, the UNIFY project, or
the Commission of the European Union.
REFERENCES
[1] “ETSI ISG for NFV, ETSI GS NFV-INF 001, Network
Functions Virtualisation (NFV); Infrastructure Overview,”
http://www.etsi.org/deliver/etsi gs/NFV-INF/001 099/001/01.01.01
60/gs NFV-INF001v010101p.pdf, [Online; accessed 19-May-2015].
[2] M. Baldi and A. Sapio, “A network function modeling approach for
performance estimation,” in 2015 IEEE 1st International Forum on
Research and Technologies for Society and Industry Leveraging a better
tomorrow (RTSI 2015), Torino, Italy, Sep. 2015.
[3] G. Pongr´acz, L. Moln´ar, Z. L. Kis, and Z. Tur´anyi, “Cheap silicon: a myth
or reality? picking the right data plane hardware for software defined
networking,” in Proceedings of the second ACM SIGCOMM workshop
on Hot topics in software defined networking. ACM, 2013, pp. 103–108.
[4] D. Joseph and I. Stoica, “Modeling middleboxes,” Network, IEEE,
vol. 22, no. 5, pp. 20–25, 2008.
[5] A. Gember, A. Krishnamurthy, S. S. John, R. Grandl, X. Gao, A. Anand,
T. Benson, A. Akella, and V. Sekar, “Stratos: A network-aware orches-
tration layer for middleboxes in the cloud,” Technical Report, Tech. Rep.,
2013.
[6] J. Soares, M. Dias, J. Carapinha, B. Parreira, and S. Sargento,
“Cloud4nfv: A platform for virtual network functions,” in Cloud Net-
working (CloudNet), 2014 IEEE 3rd International Conference on. IEEE,
2014, pp. 288–293.
[7] F. Lucrezia, G. Marchetto, F. G. O. Risso, and V. Vercellone, “Intro-
ducing network-aware scheduling capabilities in openstack,” Network
Softwarization (NetSoft), 2015 IEEE 1st Conference on, 2015.
[8] “Intel 64 and IA-32 Architectures Optimization Reference Manual,”
http://www.intel.com/content/www/us/en/architecture-and-technology/
64-ia-32-architectures-optimization-manual.html, [Online; accessed
19-May-2015].
1http://www.fp7-unify.eu/