Dev@Pulse 2014 Lightning Talk.
Focused on how to use the IBM Cloud and NetflixOSS for high availability/automatic recovery, elastic and web scale, and high velocity continuous delivery. The talk also includes a live demo of chaos testing (Chaos Gorilla specifically) where the application was shown to have enough high availability to survive an entire datacenter / availability zone outage.
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
1. Going cloud native for your
applications and services
Jerry Cuomo
Andrew Spyker
2. Topics
Jerry is going to cover
– Our Journey to Cloud Services
– Stop along the way, Winning Netflix Cloud Prize
– Our Goals in 2014 in delivering Cloud Services
@JerryCuomo
Andrew is going to
@aspyker
– Describe “Xen, Methodology, Approach” to
building world-class services
– Highlighting new capabilities to support this
methodology, running on IBM Cloud
– Prove this by example
3. Our Journey to Cloud Services
• From my blog
– http://bit.ly/cuomoblog
• In 2014, we will continue driving our
software to the cloud. To complement our
packaged software business, we are transforming
our development operations to also deliver our
wares as self service cloud-native offerings within
the IBM Cloud (SoftLayer, Bluemix, PureApp).
• You know you have a cloud service if it is
addressable via URL, has Ts&Cs, and has an
operations team running it 24x7x365.
4. Acme Air and winning
the Netflix Cloud Prize
• Acme Air
– Cloud and Mobile Sample and Benchmark
• Acme Air + NetflixOSS + IBM SoftLayer
– IBM SoftLayer Port to embrace NetflixOSS platform
– Winner: Best Example Mash-Up Application Category
5. Cloud Services Goals
• We will follow the “Zen” of operating cloud services
• “We will rule the cloud, the cloud will not rule us”
– Proactive on failure and security testing and auto recovery
• Move from reactive model to predictive model
– We are always watching and anticipating
• Scalable service fabric services, ops excellence team
– Tools, libraries, services, and practices and COE for cloud
• Focus on key areas including
– Elastic and Web Scale
– High Availability and Automatic Recovery
– High Velocity Continuous Delivery
6. Elastic and Web Scale
Doing This
Not Doing That
Source: Programmableweb.com 2012
7. Elastic and Web Scale
Durable
Storage
Load
Balancers
Front end API
(browser and mobile)
Booking
Service
Authentication
Service
Temporal
caching
Strategy
Benefit
Make deployments automated
Without automation impossible
Expose well designed API to users
Offloads presentation complexity to clients
Remove state for mid tier services
Allows easy elastic scale out
Push temporal state to client and caching tier
Leverage clients, avoids data tier overload
Use partitioned data storage
Data design and storage scales with HA
9. Highly Available Service Runtime Recipe
Execute
auth-service
call
(REST services)
Call “Auth Service”
Ribbon REST client
with Eureka
Hystrix
Web App
Front End
Eureka
Eureka
Server(s)
Eureka
Server(s)
Fallback
Implementation
Micro service
Implementation
Server(s)
App Service
(auth-service)
Karyon
Implementation Detail
Benefits
Decompose into micro services
•
•
Key user path always available
Failure does not propagate across service boundaries
Karyon /w automatic Eureka registration
•
•
New instances are quickly found
Failing individual instances disappear
Ribbon client with Eureka awareness
•
•
Load balances & retries across instances with “smarts”
Handles temporal instance failure
Hystrix as dependency circuit breaker
•
•
Allows for fast failure
Provides graceful cross service degradation/recovery
10. IaaS High Availability
DAL01
Datacenter (DAL06)
DAL05
Global Load
Balancers
Eureka
Local LBs
Web App
Auth Service Booking Service
Region (Dallas)
Cluster Auto Recovery and Scaling Services
Rule
Why?
Always > 2 of everything
1 is SPOF, 2 doesn’t web scale and slow DR recovery
Including IaaS and cloud services
You’re only as strong as your weakest dependency
Use auto scaler/recovery monitoring
Clusters guarantee availability and service latency
Use application level health checks
Instance on the network != healthy
11. Let’s prove it
• What is you lost a random instance?
Demonstrated as part
of Netflix Cloud prize
bit.ly/noss-sl-blog
• What if you lost a whole datacenter?
DEMO TIME!
22. DEMO Success!
DAL06
Online Video(DAL05)
Datacenter
DAL01
(shows recovery as well)
Global Load
Balancers
Eureka
http://bit.ly/sl-gorillavid
✗
Local LBs
Web App
Region (Dallas)
Cluster Auto Recovery and Scaling Services
Chaos Gorilla
Auth Service
Booking Service
24. Continuous
Delivery
Continuous
Build Server
Baked to SoftLayer
Image Templates
Cluster v1
Canary v2
Cluster V2
Step
Technology
Developers test locally
Unit test frameworks
Continuous build
Continuous build server based on gradle builds
Build “bakes” full instance image
Imaginator (Aminator inspired) creates SoftLayer images
Developer work across dev and test
Archaius allows for environment based context
Developers do canary tests,
red/black deployments in prod
Asgard console provides app cluster common devops
approach, security patterns, and visibility
25. More details?
• PAS-1418A - Porting the Netflix OSS Cloud
Architecture to SoftLayer
– Today - 5:00 – 6:00, Room 116
• All code available on Github
– netflix.github.io
– github.com/EmergingTechnologyInstitute
– Blog - iSpyker.blogspot.com
– Twitter - @aspyker