Our question was: "what is the best infrastructure to run our SaaS on?". We tried most of the infrastructure software, starting from vmware, moving to Proxmox, OpenNebula, OpenStack, and Google's Ganeti. We also considered software storage and software defined networks. We mastered some of these technologies and we even contributed to the projects. But we felt we needed something different and lighter and we "blended" our own solution, mixing and matching the best of the above and keeping in consideration our needs in terms of computing, network and storage. This talk will go through our (long) journey, understanding pros and cons of each technology and describing what we used to deliver SecurePass and other services.
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Why We Tried (and Ignored) Famous IaaS To Deliver SecurePass
1. 31st January 2015
Giuseppe Paternò
@gpaterno
Why We Tried (and Ignored)
Famous IaaS To Deliver
SecurePass
https://www.flickr.com/photos/kewl/8475764430
2. Knowing
“GIPPA”...
CTO of GARL
Swiss company behind the SecurePass cloud identity management
service. GARL is mostly focused on identity and security
Trusted advisor for customers on cloud and complex OSS
architectures: OpenStack, CloudStack, OpenNebula & Ganeti
Previously Senior Solution Architect in Canonical, Red Hat, Sun
Microsystems and also in IBM.
I work with Linux since 1996. In my (little) spare time, I publish
books and whitepapers
3. MAKING THE CLOUD A SAFER SPACE
IT security products and cloud services focused on
identity protection on the Cloud. Born from Symantec,
conducting pentest and vulnerability assessment on their
behalf in EMEA.
Most of the customers in finance and telco operators.HQ
based in Switzerland (Lugano and Zurich) and office in
London.
User privacy is protected by strict Swiss privacy
regulations, no UE or US exceptions allowed.
GARL?? What is that?
5. SecurePass is your Swiss knife to protect
and manage identity in the cloud: a suite
of integrated tools that allow web apps,
OS and devices to quickly manage users
and secure access.
Cloud Identity Management (LDAP)
Strong authentication (RADIUS)
Web Single Sign-On (CAS)
Federation (SAML)
Central logging platform (next gen)
Supported in the distribution by:
One Time
Password
345227
345227
345227 Identity
Management
Single
Sign-On
SecurePass, the cloud identity platform
6. Datacenter 3 (Active)
Switzerland
(Former Military Premise, secret location)
Datacenter 2
(Active)
Italy
Datacenter 1
(Active)
Switzerland
High-speed secure replication among all sites
Multi-datacenter Global Secure Architecture
Off-sites
Global load
balancers
Off-sites
Global load
balancers
8. MySQL
(Billing)
Keepalived
+ LVS
LDAP
Responder
Data Node
(Cassandra +
Python wrappers)
Master Keys
LDAP is OpenLDAP plus custom
backend plugin
Keepalived as a balancer, cannot use
haproxy for missing UDP
MySQL mostly used only for billing, no
actual user data
Master Keys kept in a secure location
Data nodes with Cassandra wrapped
in a Python for internal APIs and
crypto
Site Overview
Mixed use of
CentOS 6 and
Debian
Wheezy/Squeeze
(in update)
Mostly
Python
Healthchecks
custom-made to
simulate OTP
requests
10. Feature rich (vSphere HA, vMotion,
DRS, I/O control)
Very large ecosystems
All os vendors make it supported and
certified under ESXi
ESXi can be downloaded and used
freely
Per core license - expensive
Proprietary platform
ESXi can not be APIs accessed - need
to buy licenses
Latest versions can be improved,
getting complicated even for easier
tasks
Pros/Cons
12. Used to be a contributor in patches
Very similar to the VMWare approach
It’s free and opensource, based on
Debian with Perl
Has initial concept of software-defined
storage (Ceph), but doesn’t work very well
Some weakness on the networking side,
it is complex to manage multiple VLANs,
need to do bridges by hand
Doesn’t play nice with more than two
nodes in the cluster (in our experience
back in 2012)
Doesn’t work well on some hardware,
kernel is modified from stock Debian (and
sometimes broken)
Community sits around 2 brothers in
Vienna and with one of them there are
“relational problems”
Pros/Cons
14. Probably the most successful example
of opensource project after Linux
Support from many OEMs and OS
vendors
Interoperability with many
components, just pick your favorite
one and plug it in
Standard and well accepted APIs
Very complex to setup and
troubleshoot
Although common codebase, might
differ from implementations
Need high numbers of management
nodes
Difficult to maintain for a small team,
not worth for same application
Pros/Cons
16. Lightweight architecture
Can start with a single node and scale
out easily
Designed for use local storage and
cheap storage (like Ceph)
Great for "standard" Linux and
Windows workloads
Easy to pick up for a standard Linux
sysadmin
HA of master need to be triggered from
Monitoring platform
Lack of some features (ex: storage
vmotion)
Start to become complex from a code
perspective
Release cycles too short (can’t
upgrade every 3 months!)
GANETI
Pros/Cons
18. We like the templating idea
Great to deploy frontends and upgrade
easily
..... but:
(For us) don’t make sense as we have
RPM packages for our software and
kickstarts
Security concerns running a single
kernel image
Pros/Cons
20. Stable virtualization platform (KVM preferred)
Works on a broad range of hardware, also refurbished
one
HA not needed in the platform (performed at application
level)
Trigger creation of virtual machines through scripts or
APIs
Flexible VLANs, without need of reconfigure the platform
Backup of the VMs not needed
Cheap storage solution
Our requirements
21. Base CentOS 6
Openvswitch from RDO
GlusterFS from
gluster.org (EPEL release,
only for the control nodes)
Corosync/Pacemaker
(only for control nodes)
What we did
Build our own flavor with:
https://www.flickr.com/photos/stickkim/7377611424
22. Control Nodes
(libvirt/KVM + Open vSwitch
+ Corosync + Gluster)
Service Node
(Libvirt/KVM + Open vSwitch)
Service Node
(Libvirt/KVM + Open vSwitch)
Service Node
(Libvirt/KVM + Open vSwitch)
Switching
fabric
(single stack)
Control node hosts:
Firewalls
CentOS/Debian Mirrors
Kickstart generator + PXE
Custom repositories
Puppet (migrating to ansible)
Backups (Data + Control VM)
Current deployed architecture
23. Gluster
VM datastore
Gluster
Backup Area
OVS Network
VM VM VM VM
VM VM VM VM
L
I
B
V
I
R
T
CoroSync/
Pacemaker
Bootstrap
& mgmt VM
Central management/Orchestration
(PXE + Mirror + Puppet)
Control Nodes
24. <network connections='3'>
<name>ovs-network</name>
<uuid>c162d855-eae3-26e8-9cf4-27fffc10faa0</uuid>
<forward mode='bridge'/>
<bridge name='ovsbridge0' />
<mac address='52:54:00:CA:E5:4B'/>
<virtualport type='openvswitch'/>
<portgroup name='mgmt' default='yes'>
<vlan>
<tag id='10'/>
</vlan>
</portgroup>
<portgroup name='publicnet'>
<vlan>
<tag id='1000'/>
</vlan>
</portgroup>
</network>
<interface type='network'>
<mac address='52:54:00:49:95:c0'/>
<source network='ovs-network' portgroup='publicnet'/>
<model type='e1000'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x04' function='0x0'/>
</interface>
portgroup in
libvirt
VLAN tag in VM
Open vSwitch in libvirt
25. 98755c2a-d40d-4e5d-a8f7-e0bedeaca2fa
Bridge "ovsbridge0"
Port "ovsbridge0"
Interface "ovsbridge0"
type: internal
Port "vnet0"
tag: 10
Interface "vnet0"
Port "vnet2"
Interface "vnet2"
Port ovsbond
Interface "eth0"
Interface "eth1"
Port "vnet1"
tag: 1000
Interface "vnet1"
Port "vnet3"
tag: 1000
Interface "vnet3"
ovs_version: "2.1.3"
VLAN tags
Open vSwitch VLAN tag propagation