2. About Gousto
• An online recipe box service
• Customers come to our site, or use our
apps and select from 12 meals each week.
• They pick the meals they want to cook and
say how many people they’re cooking for
• We deliver all the ingredients they need in
exact proportions with step-by-step recipe
cards.
• No planning, no supermarkets and no
food waste – you just cook (and eat)
• We’re a rapidly growing business
• With a diverse set of technology
requirements.
• And it needs to be delivered yesterday…
3. In the beginning…
• Been on AWS from the outset
• But getting very little value from being on
the platform
• Developed a 2 tier API driven platform.
• Suffered from tight coupling between API
and front end
• We had built ourselves a 2 tier monolith
• We were experiencing a lot of growing
pains:
• 2 hour deployment process
• Difficult to introduce new clients
• 150k lines of code in a single repo –
onboarding devs was slow
• All infrastructure managed through the
console – Massive risk
• Production deployments happening every
2 weeks at most
Web Site
(PHP Laravel)
Business Logic
(PHP Laravel)
Users
CMS Users
Data Scientists Reporting
Read Replica
Production
Database
Tightly Coupled APIs
4. Build a platform, not a collection of independent services
Common
Service
NGINX
. . .
Ansible Roles
Platform.ymlTests
Network
Content
Security
. . .
Platform Templates
Data & Cache
Auto Scaling & ELBs
Alarms & Logs
. . .
Gousto Service
Automated Deployment Pipeline
CloudFormation
Platform templates
Platform
Deployment
Bucket
Ansible Roles and Service Templates
5. This creates consistency from the outset
• Upfront investment made consistency the
path of least resistance
• A service is defined as:
• A deployment descriptor
• Additional Ansible roles
• CloudFormation additions and overrides
• The source code
• This approach bakes in consistency and
best practice
• Still allows complete customisation where
really needed
• Language agnostic
• Our first 2 [Micro]services were our
monoliths!
src
ansible
cloudFormation
service.yml
roles
overrides
additions
playbook.yml
Service Repository
10. Create a platform wide event log
Customer
Service
Activity Log
Service
Order
Service
Product
Service
Recipe
Service
. . .
AWS
Lambda
Amazon
DynamoDB
Platform
Deployment
Bucket
SNS
Subscribe to all messages
Event API
Amazon
Redshift
Data ScientistsAWS
Lambda
Subscribe to subset of
messages
12. n services and >1 team = lots of ways to paginate…
• We had standardised our infrastructure
and deployment pipelines but neglected
our APIs
• Needed to create a consistent
implementation of
• Data representation
• Error handling
• Pagination
• Media management
• Versioning
• …
• We used JSON Schema to standardise
primitives
• Test conformity to standards within your
CI pipeline
14. Don’t forget about your async messaging
• Consistent Async messaging is as vital as
our APIs
• Most messages announced CUD events on
domain objects (order-created)
• Message contents was normally the JSON
representation of the object
• Accessing common metadata was brittle
• So we introduced a message Schema:
• Required message senders to extract
consistent metadata from the message
payload
• Helps to prevent model changes causing
cascading breaks across services.
15. Insufficient log management & request tracing
• CloudWatch and CloudWatch Logs isn't
sufficient to support a Microservice
platform in production
• UI is lacking the functionality needed to
locate specific events in a single Log Group
• Don’t even think about trying to correlate
events across multiple groups…
• Why did we use it:
• It already there
• Integration with Cloudwatch alarms was a
simple way to start filtering and
monitoring log events
• Very easy to get started and well
supported in CloudFormation
• We’re now exploring alternatives (ELK
stack + injecting UUID for each
request)
16. AWS has allowed us to move faster
So far
• We currently have 9 services in
production, launching 2 more in the
coming months
• Upfront investment has driven consistency
into our services
• Moved from 1 production deployment per
sprint to approx. 3 per day
• Gone from 150k + lines of code in a repo
to around. 3-4k
Next
• Looking to use Lambda for more of our
smaller services
• Will be investing in a CLI to make
bootstrapping services even quicker
0
5
10
15
20
25
30
Week2
Week3
Week4
Week5
Week6
Week7
Week8
Week9
Week10
Week11
Week12
Week13
Week14
Week15
Week16
Week17
Week18
Week19
Week20
Week21
Week22
Week23
Week24
Week25
Week26
Week27
Week28
Week29
Week30
Week31
Week32
Week33
Weekly Production Deployments
core fe platform gateway products auth gifts admin maintenance
VP Engineering at Gousto
Joined last summer
Before that I was actually a Solutions Architect at AWS working with retail customers across the UK
I wanted to spend 30 minutes today talking about how we’ve used AWS to develop our Microservices platform
I’ve made the micro bit optional as I think a lot of what I talk about is applicable to any service oriented architecture – Our services are probably small but not micro!
Hands up if you attended the AWS loft closing party last month? Apologies – You may have heard some of this before!
Before the tech
Wanted to talk about “Who are Gousto, what do we do?”
Question: Who has heard of Gousto? Keep your hands up if you’ve been a customer? Did you like it?
If you’re not a customer and you’l like to try us we have some discount vouchers at the front – and they come with a free pack of nigella seeds. I didn’t know what they were either until I started to work at Gousto
We’re an online recipe box service etc…
Customers come to our site and choose from a selection of weekly meals
We deliver all the ingredients in exact proportions along with step-by-step recipe cards
We take care of all of the planning, preparation and shopping – leaving you with the fun parts the cooking and the eating
We’re a rapidly growing business
In terms of volume
But also in the breath of our offering (more meals, more convenience)
This makes for a diverse set of technology requirements (e-commerce, logistics, warehouse management)
We’re a start up – its all wanted yesterday!
A massive appetite to move quickly!
So where did it start
Gousto have been on AWS from the outset.
However, we weren't getting any really value from the platform
Typical of how customers first use AWS – Still just hired Tim, we now just hire by the hour
Initially we developed a fairly typical 2 tie, API driven platform
However we suffered from very tight coupling between the API and the web views making it hard to change one without the other
We’d built a 2 Tier Monolith
This was causing us a lot of pain as we grew
No CI/CD, Manual deployments, build our own AMIs, 2 hours of a developers time
Tight coupling between API and FE meant impossible to deploy independently, very difficult to introduce new clients like mobile apps
Our codebase was getting larger and larger. Not massive by most peoples standards but it was growing and we were already seeing a slow down of velocity and on boarding
Everything was managed via the console, security group rules, ASG changes, instance management – MASSIVE RISK
This was all resulting in a deployment every 2 weeks – not good enough for the business.
Breaking out monolith would help to solve issues
Danger of swapping 1 big problem for lots of smaller ones
Needed to invest heavily in our own platform, a container for our microservices that provides certain conventions and best practices
CLICK
First thing stable, foundation on which to deploy microservices
No more making changes in console
Same rigorous process of Pull requests and peer reviews for infra as we already have for application code
We developed a set of CloudFormation templates that defined various layers of our stack – This allowed us to manage our instrastructure as code
TOP TIP: Writing raw cfn templates can be cumbersome.
Libraries out there to define templates in code, we decided to use templates to generate templates – Jinja 2. We’ve found this very flexible
Use for Common fragments, looping and variable substitution
CLICK
Then invested in a continuous deployment platform using codeship
provides repository driven deployments
Now had a stable foundation, we could produce carbon copies of our platform when standing up new environments
How to get consistency across services?
CLICK
We again relied on CloudFormation to create a “Cookie Cutter” of a Gousto MIcroservice
All services consist of ASG, ELB etc.
Also standard resources to help normalize how these services can be supported in production
Standard definitions of other resources such as DynamoDB tables, elasticache clusters etc
Standard deployment approach using CloudFormations Rolling updates on ASGs
CLICK
We also wanted to standardise the way we provisioned our EC2 instances. We choose Ansible.
Platform provided roles for security hardening, monitoring software. Common services (NGINX, PHP etc)
Integrated Ansible into CFN, run on instance start (nothing to do for the devs)
CLICK
In integrated into CD pipline to package up a microservices starter kit
versioned and deployed to S3
This is a lot of upfront work before we starting writing our first microservice
We wanted to encourage developers to build services in a consistent manner
Building out the platform and a CD pipeline made consistency the path of least resistance
Now its easy to get up and running with a service:
Create a new repo
Add a src folder with your code
Include a service.yml file to define
Instance sizes
Scaling policies
Additional alarms etc
Define what Ansible roles you wish to use
Commit
We still provide maximum flexibility
Develop an Ansible role to do any final configuration
Still have the option to include any arbitrary CloudFormation
---------------------------
This approach bakes in consistency and best practice to every service we develop
However, we still provides complete customisation when needed
This approach is also language agnostic
The ultimate test was to migrate our existing monoliths onto the microservices platform – it worked.
I like to think of an analogy around car manufacturing
There used to be a defined process, but it was full of manual steps
There was a rough order but this was sometimes not followed
Things were (mostly) documented but the process relied on inferred knowledge
But now we have fully automated production likes manned by robots, no human error. No variability
And the quality of the product has gone up massively
So I just wanted to highlight some of the areas where AWS services particularly helped.
Often in ways that’s just not possible in our own data centres or even other cloud providers
Controversial I know…
Its not perfect
Updates still take way too long
Would be great to have a way to enforce alignment to the template (remove config drift)
Better upfront validation
Its what makes the cloud the cloud
Wanted to highlight some of the areas where we’ve found the AWS services to be incredibly useful in moving forwards quickly
One of those was in asynchronous messaging between services
We knew it was vitally important to remove synchronous calls between services within the client request response loop where possible
Synchronous calls can kill performance and increase brittleness of the system
However, we didn’t want to manage our own message bus on AWS
We rely heavily on SNS and SQS to provide universal notifications and messaging between services.
Supported on our platform, Just declare what events you wish to publish and consume and the platform will handle the SNS Topics, SQS queues and associated subscriptions
We have found this messaging patters to be indispensable, it drives the vast majority of communication between our services so far
TOP TIP: spend time thinking about messaging standards, just as important as your API standards
This asynchronous messaging approach also has a number of useful benefits:
One common challenge in a microservice environment is obtaining a consistent view of what happened when
Understanding how items like orders and customer profiles change over time and by whom is essential for our customer service team.
As we grow its also important we can provide audit logs of all activity on the platform.
Our messaging and notification platform has made this easy
We’ve created a lambda function that subscribes to all messages coming from all Microservices.
The message is then archived to S3 and a common event containing data such as Actor, object owner, action type and even the user agent are written to a DynamoDB table
Another Lambda function then exposes this data as a common event API
This is then integrated into our CC tools
CLICK
In addition to this we also feed those same messages into Snowplow our event based analytics tool, this allows us to process events in real-time and make the data available to our data scientists via RedShift
So far I’ve highlighted all the things we did well
This was the version of the deck I presented internally to our leadership team
But we have also made a bunch of mistakes along the way
My most recent experience at AWS was all about designing Infrastructure on the AWS platform
That is where I concentrated my efforts when I joined Gousto
We ended up with a fully automated CI platform, making use of Infrastructure as code to enforce consistency and automation across our infrastructure and other cloud components
We had a cookie cutter for microservice
But we hadn't invested the same amount of effort into our software design – In particular our APIs
We now had numerous teams working on separate and distinct code basis
Quickly needed to agree a set of standards to enforce consistent implementations of:
Data representation (more ways to represent a boolean than I could ever imagine)
Error handling
Pagination
Media management
Versioning
And of course, as the team grows the existence of standards isn't enough, you need to test for conformity within your CI pipeline
As well as standards we also struggled initially with a lack of documentation around our APIs
It was fine until we employed an external agency to create our initial iPhone app.
We’ve since standardized ion Swagger to document our APIs
We take a DDD approach to developing new APIs
Code your swagger spec using the editor
Our CI pipeline will pick up your api.yml file and generate static docs
These are versioned and published to S3
Once reviewed you can start wring tests.. And then eventually the code.
Feels like a laborious process when you describe it but we’ve found its sped us up in the long run..
And of course – when thinking about standards its not all about the API – don’t neglect your async messaging standards – we did. And its painful!
Most of our messages announce a Create/Update/Delete even on a particular domain mo0del within our system. (order-created)
Most service teams simply dumped the order into the SNS message and forgot about it.
Other service teams started to pick these up and use them.
It’s a very brittle way of going about things
Putting the emphasis on the message consumer to extract common metadata such as the object owner, the actor (who make the change), or when the event happened meant that model chanhges caused cascading failiure within our system
Instead put the emphasis on metadata extraction on the message sender
That way as the model changes the sender can ensure consistency in the metadata at least
We defined a scheme – Super useful for the platform wide event log I discussed earlier.
So to wrap up – where are we now
Upfront investment in our platform has allowed to quickly grow to 11 services running in production
Services are stable with a low support overhead due to consistency
We are seeing the results already – Number of prod deployments is steadily climbing, every bug and story can now be deployed independently
The size of our services have massive reduce – 100k plus down to 3-4k
Whats next?
We think we can go a lot faster. Looking at Lambda to reduce the moving parts further
Now that we have a standard representation of a service we can build better tooling to help devs bootstrap new services far quicker.
Wouldn't be a tech startup talk without a shameless recruitment plug
Our blog, latest article on what we're doing on snowplow