Considerations for Building Your Private Cloud.pdf
1. Considerations
for
Building
a
Private
Cloud
Ryan
Richard
OpenStack
Engineer
ryan.richard@rackspace.com
@rackninja
October 12, 2012
Thursday, October 18, 12
3. What
is
a
Private
Cloud?
Generally
considered
to
be
smaller
than
a
“public”
cloud
Less
than
100
physical
servers
(for
this
talk)
API
endpoints
may
not
be
publicly
accessible
Limited
inbound
connectivity.
Use
floating
IPs
to
allow
for
inbound
connectivity
Can
be
customized
for
specific
workloads
(hardware/
network/etc)
Company
may
leverage
multiple
private
clouds
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
4. What
is
a
Private
Cloud?
Generally
considered
to
be
smaller
than
a
“public”
cloud
Less
than
100
physical
servers
(for
this
talk)
API
endpoints
may
not
be
publicly
accessible
Limited
inbound
connectivity.
Use
floating
IPs
to
allow
for
inbound
connectivity
Can
be
customized
for
specific
workloads
(hardware/
network/etc)
Company
may
leverage
multiple
private
clouds
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
5. Build
with
the
End
in
Mind
What
are
you
building
for?
A.
Are
you
building
for
10
servers?
20?
100?
B.
Or
are
you
building
500
instances?
1000?
2000?
C.
Or
are
you
building
400
CPUs?
3TB
RAM?
100TB
disk?
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
6. Build
with
the
End
in
Mind
What
are
you
building
for?
A.
Are
you
building
for
10
servers?
20?
100?
B.
Or
are
you
building
500
instances?
1000?
2000?
C.
Or
are
you
building
400
CPUs?
3TB
RAM?
100TB
disk?
d.
ALL
OF
THE
ABOVE
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
7. Build
with
the
End
in
Mind
Example
hardware
12
Physical
Cores
-‐
24
w/
Hyperthreading
-‐
48
vcpus
w/
2:1
overcommit
ratio
128GB
of
RAM
-‐
1:1
overcommit
ratio
8
x
300GB
drives
RAID
10
-‐
~1.2
TB
usable
disk
space
How
many
instances
can
I
run
on
this
physical
host?
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
8. Build
with
the
End
in
Mind
Example
hardware
12
Physical
Cores
-‐
24
w/
Hyperthreading
-‐
48
vcpus
w/
2:1
overcommit
ratio
128GB
of
RAM
-‐
1:1
overcommit
ratio
8
x
300GB
drives
RAID
10
-‐
~1.2
TB
usable
disk
space
How
many
instances
can
I
run
on
this
physical
host?
(total
VCPUs
/
smallest
flavor
#VCPUs)
=
maximum
#
of
instances
Double
or
quadruple
this
to
account
for
growth
-‐
size
of
fixed
network
range
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
9. Build
with
the
End
in
Mind
Networking
We
can
build
a
cloud
with
2
networks
(3
if
using
floating
IPs)
Host
Network
(physical
machine
access,
OpenStack
services)
Fixed
Network
(instance
network)
Floating
network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
10. Build
with
the
End
in
Mind
Networking
We
can
build
a
cloud
with
2
networks
(3
if
using
floating
IPs)
Host
Network
(physical
machine
access,
OpenStack
services)
Fixed
Network
(instance
network)
Floating
network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
11. Build
with
the
End
in
Mind
Networking
is
the
important
part,
Networking
get
it
right!
We
can
build
a
cloud
with
2
networks
(3
if
using
floating
IPs)
Host
Network
(physical
machine
access,
OpenStack
services)
Fixed
Network
(instance
network)
Floating
network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
12. Build
with
the
End
in
Mind
Networking
is
the
important
part,
Networking
get
it
right!
We
can
build
a
cloud
with
2
networks
(3
if
using
floating
IPs)
Host
Network
(physical
machine
access,
OpenStack
services)
Fixed
Network
(instance
network)
Floating
network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
13. Build
with
the
End
in
Mind
Networking
is
the
important
part,
Networking
get
it
right!
We
can
build
a
cloud
with
2
networks
(3
if
using
floating
IPs)
Host
Network
(physical
Easy
to
add
physical
nodes
and/or
machine
access,
networks
OpenStack
services)
Fixed
Network
(instance
network)
Floating
network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
14. Build
with
the
End
in
Mind
Networking
is
the
important
part,
Networking
get
it
right!
We
can
build
a
cloud
with
2
networks
(3
if
using
floating
IPs)
Host
Network
(physical
Easy
to
add
physical
nodes
and/or
machine
access,
networks
OpenStack
services)
Fixed
Network
(instance
network)
Floating
network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
15. Build
with
the
End
in
Mind
Networking
is
the
important
part,
Networking
get
it
right!
We
can
build
a
cloud
with
2
networks
(3
if
using
floating
IPs)
Host
Network
(physical
Easy
to
add
physical
nodes
and/or
machine
access,
networks
OpenStack
services)
Fixed
Network
(instance
Don’t
try
to
change
the
fixed
network) network
once
in
production
Floating
network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
16. Build
with
the
End
in
Mind
Networking
is
the
important
part,
Networking
get
it
right!
We
can
build
a
cloud
with
2
networks
(3
if
using
floating
IPs)
Host
Network
(physical
Easy
to
add
physical
nodes
and/or
machine
access,
networks
OpenStack
services)
Fixed
Network
(instance
Don’t
try
to
change
the
fixed
network) network
once
in
production
Floating
network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
17. Build
with
the
End
in
Mind
Networking
is
the
important
part,
Networking
get
it
right!
We
can
build
a
cloud
with
2
networks
(3
if
using
floating
IPs)
Host
Network
(physical
Easy
to
add
physical
nodes
and/or
machine
access,
networks
OpenStack
services)
Fixed
Network
(instance
Don’t
try
to
change
the
fixed
network) network
once
in
production
Floating
network Easy
to
add
additional
floating
networks
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
18. Build
with
the
End
in
Mind
Glance
Disk
space
on
server
acting
as
glance
backend
(file
based)
will
be
a
limiting
factor.
Good
alternatives:
Swift,
CloudFiles,
NFS
(locally
mounted)
Local
disk
is
considerably
faster
than
the
alternatives
Will
you
be
leveraging
snapshots?
If
so,
disk
space
will
need
to
be
a
serious
consideration
If
using
qcow2,
set
“snapshot_image_format=qcow2“
to
help
limit
disk
usage
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
19. Build
with
the
End
in
Mind
Glance
Performance
Network
throughput
is
a
limitation
1000Mb/s
=
125MB/s
max
(expect
~112MB/s
realistically)
Large
sequential
read/writes
-‐
RAID5
may
be
preferred
Lean
towards
disk
bandwidth
over
raw
IOPs
Reduce
#
of
images
to
allow
for
more
efficient
local
caches
on
compute
nodes
(dramatically
increasing
performance
of
instance
creation)
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
20. Build
with
the
End
in
Mind
Glance
Performance
Network
throughput
is
a
limitation
1000Mb/s
=
125MB/s
max
(expect
~112MB/s
realistically)
Large
sequential
read/writes
-‐
RAID5
may
be
preferred
Lean
towards
disk
bandwidth
over
raw
IOPs
Reduce
#
of
images
to
allow
for
more
efficient
local
caches
on
compute
nodes
(dramatically
increasing
performance
of
instance
creation)
Image
Size Not
Cached Cached
1.4GB 20secs 1sec
16.4GB 2min
21secs 1sec RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
21. Build
with
the
End
in
Mind
Glance
Performance
Network
throughput
is
a
limitation
1000Mb/s
=
125MB/s
max
(expect
~112MB/s
realistically)
Large
sequential
read/writes
-‐
RAID5
may
be
preferred
Lean
towards
disk
bandwidth
over
raw
IOPs
Reduce
#
of
images
to
allow
for
more
efficient
local
caches
on
compute
nodes
(dramatically
increasing
performance
of
instance
creation)
Image
Size Not
Cached Cached *times
from
“creating
image”
to
“qemu-‐img
1.4GB 20secs 1sec create”
16.4GB 2min
21secs 1sec RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
22. To
Swift
or
not
to
Swift?
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
23. To
Swift
or
not
to
Swift?
Pros
Scalable
object
storage
that
works
great
as
a
backend
for
Glance
Can
be
leveraged
as
object
storage
for
other
parts
of
the
business
Ability
to
quickly
increase
the
amount
of
storage
available
Extremely
stable
if
designed
correctly
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
24. To
Swift
or
not
to
Swift?
Pros Cons
Scalable
object
storage
that
works
great
as
a
backend
for
Additional
expertise
needed
to
run
Swift
Glance
Architecture
(network/swift
Can
be
leveraged
as
object
storage
for
other
parts
of
the
components)
design
is
important
to
get
right
business
Ability
to
quickly
increase
the
Depending
on
initial
usage,
there
may
be
high
up
front
costs
to
amount
of
storage
available populate
5
zones
Extremely
stable
if
designed
correctly
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
25. Architecture
Examples
and
Thoughts
1
-‐
20
physical
servers 20-‐50
physical
servers
Single
controller
(single
API
Single
controller
(single
API
endpoint,
single
scheduler,
etc)
endpoint,
single
scheduler,
etc)
should
suffice should
suffice
Single
network
(1Gbps)
for
instance
Investigate
Swift
as
a
glance
connectivity
and
OpenStack
services
backend.
is
sufficient
Start
looking
into
ways
to
break
Rackspace
“Alamo”
installer apart
various
controller
services
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
26. Architecture
Examples
and
Thoughts
50-‐100
servers
Keep
an
eye
on
the
scheduler
to
make
sure
it’s
not
a
bottleneck
Strongly
consider
swift
especially
for
snapshots
Consider
Availability
Zones/Cells
(didn’t
make
it
into
Folsom)
Consider
“frontend”
and
“backend”
networks
for
instances
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
27. Architecture
Examples
and
Thoughts
50-‐100
servers
Keep
an
eye
on
the
scheduler
to
make
sure
it’s
not
a
bottleneck
Strongly
consider
swift
especially
for
snapshots
Consider
Availability
Zones/Cells
(didn’t
make
it
into
Folsom)
Consider
“frontend”
and
“backend”
networks
for
instances
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
28. Architecture
Examples
and
Thoughts
50-‐100
servers
Keep
an
eye
on
the
scheduler
to
make
sure
it’s
not
a
bottleneck
Strongly
consider
swift
especially
for
snapshots
Consider
Availability
Zones/Cells
(didn’t
make
it
into
Folsom)
Consider
“frontend”
and
“backend”
networks
for
instances two
or
more
instance
networks?
Set
“use_single_default_gateway”
in
nova.conf
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
29. Performance
Considerations
and
Bottlenecks
IO
20-‐40
instances
per
physical
server
causes
high
random
IO
Reduce
IO
as
much
as
possible
-‐
i.e.
centralized
logging
Can
be
further
mitigated
with
Cinder
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
30. Performance
Considerations
and
Bottlenecks
Async&Random&IO&
IO rs/speed/test12"(cfq,"host"deadline,"cache=none)"
Rs/speed/test13"(noop,"cache=writeback)"
rs/speed/test13"(cfq,"cache=writeback)"
20-‐40
instances
per
physical
Rs/speed/test12"(noop"cache=none)"
randW"(direct)"
server
causes
high
random
IO Rs/speed/test12"(cfq"cache=none)"
randR"(direct)"
randW"
Rs/speed/test13"(cfq,"cache=none,"no"ht)"
randR"
Rs/speed/test13"(deadline"cache=none)"
Reduce
IO
as
much
as
possible
-‐
compute/host"(deadline)"
i.e.
centralized
logging compute/host"(no"ht)"
compute/host"
0" 200" 400" 600" 800" 1000" 1200" 1400" 1600"
Host&vs.&Instance&
Can
be
further
mitigated
with
14000"
Cinder 12000"
10000"
8000"
compute/host"
6000" Rs/speed/test12"(cfq"cache=none)"
4000"
2000"
0"
randR" randW" randR" randW" seqR" seqW"RACKSPACE® HOSTING
seqR" seqw" | WWW.RACKSPACE.COM
(direct)" (direct)" (direct)" (direct)"
Thursday, October 18, 12
31. Final
Thoughts
Lessons
learned
Standardize
on
a
design
that
works
for
your
organization
Find
the
right
questions
to
ask
Important
to
understand
OpenStack
as
a
whole
OpenStack
is
still
changing
often,
keep
up
to
date
with
current
state
of
the
projects
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
32. But....
But
this
is
a
design
summit
also
Open
to
discussions/thoughts/questions
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12