What would you do if shortly after migrating the majority of your workloads to the public cloud, you began to struggle with unexpected cost increases, lack of visibility, and overrunning budgets by hundreds of thousands of dollars? This is what happened to StepStone, one of the largest online job boards in Europe.
3. WHO ARE STEPSTONE?
• Successful online job board business in 20 countries.
• 60 million visits / 600,000 jobs advertised per month.
• 29 million resumes.
• 24 million job alert subscribers.
• 2,200+ employees.
• Latest acquisition: Universum (Sweden).
3
5. WHO AM I?
• 20 years experience in Software Engineering
• Head of Development for NHS Jobs
The NHS employs 1.3 million staff (5th largest employer in the world)
• Led migration of Jobsite to AWS
Key success criteria: “Don’t destroy the business”
“Right First Time”
• Now Group AWS Programme Manager at StepStone.
5
7. TOTALJOBS: FIRST TO MIGRATE TO AWS
• Acquired by StepStone in 2012.
• Deadline set by previous owners to vacate their Data Centre.
• Decision made to migrate to AWS.
• ‘Lift and shift’ / ‘Forklift strategy.
• Successful – leading the way for further group adoption
7
8. OUR GROWTH IN AWS (UP TO JULY 2017)
8
Jan-15 Apr-15 Jul-15 Oct-15 Jan-16 Apr-16 Jul-16 Oct-16 Jan-17 Apr-17 Jul-17
AWS Hosting Costs for StepStone Group
StepStone Brand
build starts
StepStone Brand launches
StepWeb
Data Wizards & StepMatch
Workray and Good & Co
Career Junction
Saongroup
Jobsite
9. TOO FAST !
9
Jan-15 Apr-15 Jul-15 Oct-15 Jan-16 Apr-16 Jul-16 Oct-16 Jan-17 Apr-17 Jul-17
AWS Hosting Costs for StepStone Group
Costs had been predicted to
flatten out here for the year…
10. 2017: COST EXPLOSION !
• Budget considered ‘blown’ by mid 2017.
• Major concern to stakeholders (e.g. CFO)
“Costs out of control”
“We didn’t have this with the Data Centre”
“How can we forecast with this happening?”
• Finance teams frustrated with lack of accuracy.
• No group oversight.
10
11. … AND ANOTHER THING ……
• Brands operating in silos.
• Development teams in UK, Germany, Poland, South Africa, South America, United
States….
• Security and Cloud Best Architectural practices – are
• they being used and checked?
• DevOps: Are we building and releasing things in the best
• way for the cloud?
• Have we moved on from ‘lift and shift’ migrations?
11
14. SECURITY
• Standards agreed in sessions with the Account Owners across the entire Group.
• Monitored / enforced by Group Security Team.
• Examples:
MFA everywhere.
SSO.
Akamai integration (Application Firewall, DDOS protection)
• Agreed timetables for implementation.
• Areas to target agreed within the Group each quarter.
14
15. COST CONTROL
• All accounts have a responsible Account Owner.
• Monthly budget meetings backed with an agreed forecasting process.
• Whole year forecasted and refined as the year progresses.
• Any exceptions (typically 10% variance) followed up: Full explanation and plan of action required.
• CloudHealth rolled out everywhere to enable cost tracking, analysing and alerting. (Enterprise level
dashboard tools).
• Group Target for Reservations…
15
18. COST CONTROL: WHAT ABOUT SPOT?
• Spot instances: EC2 savings of up to 90%.
• Ideal for certain application types, e.g. Big Data
• processing.
• Case study: StepWeb.
• 6 stage pipeline, 5.5 TB database, circa 100 x r3.8xlarge instances.
• EMR was running monthly On Demand.
• Migrated to Spot.
• 87% savings.
18
19. COST CONTROL: HOW DID 2018 TURN OUT?
• 2018 was a much better year.
• We came in comfortably under budget with more accurate and explained forecasting.
• ‘No surprises’ concept for Stakeholders (CTO, CFO, Finance Teams….)
19
20. BEST PRACTICE
• Monthly Community of Practice (online) sessions: Account
Owners, DevOps teams… anyone who is interested!
• Quarterly Workshops (location rotates): More detailed
presentations, including third-party guests. Broadcast live
+ recorded for later.
• Presentations include: Reservations, Spot Instances, Security, DevOps, project
walkthroughs…. And more.
• Spreads Best Practice throughout the Group, learning from both inside and outside.
• Slack used extensively, including 3rd party Guests for ‘instant’ access.
20
21. BUZZWORD BINGO: GAMIFICATION
• Teams invited to enter our yearly competition
• Entries assessed on all of the things I have mentioned!
• Winning = £££££.
21
23. EVERYONE MUST GET CERTIFIED
• All users who can make changes to Production
environments need to be Certified to a minimum of Associate standard
(EOY 2019)
• Not a difficult standard for anyone ‘hands on’
but it ensures principles of cloud architecture
and best design are understood.
• Variety of training methods available. A Cloud Guru has greatest
adoption right now.
23
24. NOW WE HAVE A CENTRE OF EXCELLENCE!
• It has been a real adventure to get to this point.
• We have Security, Cost and Best Practice
under control.
• We have adoption across the StepStone Group
(multiple brands and countries).
• We have the confidence of our Stakeholders.
So we’re done, right? Right?!
24
27. RESILIENCE – FUTURE STEPS
• Best Practice is a moving target.
• We already build out in multiple AZs, Regions etc.
• Ensure full Active / Active infrastructure everywhere.
• Not just tested: CHAOS TESTED ( The ‘Terminate What You Like’
Test )
• Game Days: Ensure team sharpness.
• AWS gives you the tools but correct implementation down to you.
27
29. WHAT HAVE WE LEARNED?
• Establish regular Community of Practice sessions. Get the right people there!
Spread the word!
• Encourage presentations from all teams, and include the outside world.
• Establish frameworks for Security and Cost Control that are easy to understand
and use to ensure adoption.
• Invest in the right tools to help (“CloudHealth is the difference between life and
death” – DevOps Lead)
• Continue to drive Best Practice to ensure the strongest architecture.
29