1. Madrid Devops March 2013
● AWS meeting
● Socialife: successful case in AWS
● Juan Vicente Herrera Ruiz de Alejo
● Service Operations Manager en Lumata
● @jvicenteherrera
● http://www.linkedin.com/in/jvherrera
2. Socialife - The project
●Social feed aggregation/recommendation app
preinstalled in all of the Sony devices(Available in Play
Store)
●Client developed by Sony Japan
●We develop and manage APIS that provide data to the
client
●All feeds are processed and storaged in our platform
●System analyze the data and recommend you other
feeds
●Expected at the end of 2013 around 1.000.000 new
users registered in the platform and 170.000 DAU
●All servers are in AWS and the deployments and
configuration management are handled by Chef.
●Nexus and Jenkins are used for CI.
3.
4. System stats
Components EC2
– Production env(reserved instances): 43
– Custom API(Java)
nodes with current DAU. On demand
– Beanstalk instances for scale out
– RabbitMQ – Staging env: 30 nodes (Reserved instances
for ½ day)
– Redis
– 10 Load Balancers
– MongoDB (Sharding)
– 25 Security Groups
– Splunk
– 15 Key Pairs
– Varnish
– US east region
– Apache
● S3
– Alfresco
– 2 buckets VPC
IAM – 7 Network ACLs
VPC
– Multi-Factor Authentication – 10 Elastic IPs
Device(Virtual Token in – 1 VPC(2 in the future)
smartphones) – 1 Customer Gateway
– 1 Internet Gateway
– 3 Groups – 1 Virtual Private
– 6 Subnets Gateway
– 18 Users – 5 Route Tables – 1 VPN Connection
6. Advantages
● Our APIs are state-less so you can scale out very easily. Nodes are created
by Chef.
● Very easy to do performance testing using vertical scalability that EC2
provide you to increase the resources of the instances. Very quickly create
nodes with more CPU, RAM or IO if you need.
● Outage recovery plan handled with nodes snapshots (MongoDB) or Chef
(other nodes stateless)
● Good management of users through VMFA, IAM, keypairs, certificates and
user credentials
● Good security with ACLs and Security Groups
● Good integration with Chef. Chef Bootstrap machines
● Support rapid response and customized consulting for the project by
Amazon.
7. Disadvantages
● You must adapt to the size of the instances
whose resources(CPU, RAM...) are predefined
and not customizable
● You have no control over the evolution of the
products that your service depends
● You don't have access to the logs of some
instances (for example load balancers)
● Danger engaging AWS services and
consequent difficulty migrating to another DC.
8. Recommendations
● Strongly recommended run servers in more than one availability zone for
avoid a total downtime in case of outage
● Analyze performance tests for choose the minimum number of nodes that will
be running 24 * 7 and sizes to reserve instances. Reserved instances reduce
the cost to 2/3.
● Advisable to use a large number of small servers instances close to 100%
CPU usage, instead of having few powerful machines with their resources
wasted, and launch new nodes and balancing requests among them when
load increase.
● Pre balancers warming
● Request to support increasing the initial limitations of instances that can run
on a simultaneous EC2 (20)
● For certain services swings use TCP instead of HTTP. The balancing of
requests to different nodes of our APIs by TCP internally solved some
problems with HTTP requests without closing sessions. We only use HTTP
balancing for requests that come to the public Apache.
● Use Cloudformation to create network resources