We’d like to introduce you to one of the most devastating ways to cause service instability in modern micro-service architectures: application DDoS.
A specially crafted application DDoS attack can cause cascading system failures often for a fraction of the resources needed to conduct a more traditional DDoS attack.
By Scott Behrens and Jeremy Heffner
10. Microservice Primer: API Gateways and Client Libraries
Interface for middle tier
services
Services provide client
libraries to API Gateway
Diagrams provided with permission by microservices.io
11. Microservice Primer: Circuit Breaker
Helps with handling service failures
How do you know what timeout to
choose?
How long should the breaker be triggered?
Diagrams provided with permission by microservices.io
12. Microservice Primer: Cache
Speeds up response time
Reduces load on services
fronted by cache
Reduces the number of servers
needed to handle requests
https://github.com/netflix/evcache
14. Less Common But Interesting Application DDoS
Queueing
Client Library Timeouts
Healthchecks
Connection Pool
Things That Don’t Autoscale
15. How Logical Work Per Request Differs
Most Monolithic Application DDoS
Often 1 to 1
Some Microservice Application DDoS
Often 1 to Many
16. PROXY
PROXY
PROXIES
API
GATEWAY
WEBSITE
Middle Tier Service
Middle Tier Service
Middle Tier Service
Middle Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Edge Middle Backend
New School Microservice API DDoS
> python grizzly.py
…
17. PROXY
PROXY
PROXIES
API
GATEWAY
WEBSITE
Middle Tier Service
Middle Tier Service
Middle Tier Service
Middle Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Edge Middle Backend
New School Microservice API DDoS
> python grizzly.py
…
Fallback or
Site Error
API Gateway making
many client requests
Middle tier services
making many calls to
backend services
Backend service
queues filling up with
expensive requests
Client Timeouts, circuit
breakers triggered,
fallback experience
triggered
18. API GATEWAY
Middle Tier Service
Middle Tier Service
Middle Tier Service
Middle Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Edge Middle Backend
Microservice API DDoS - Simplified
Single
request
asking for
1,000
objects not
in cache
20. Cache Miss Attack
Figure out what is in the cache
Make calls that require lookups outside of cache
21. API GATEWAY
Middle Tier Service
Middle Tier Service
Middle Tier Service
Middle Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Edge Middle Backend
Microservice API DDoS - Simplified
Middle tier has to call 3
backend services, or
6,000 calls.
Single
request
asking for
1,000
objects not
in cache
Middle tier client
library doesn’t
support batching
Results in 2 RPC calls
per object, or 2,000
RPC calls
22. Workflow for Identifying Application DDoS - Part 1
Identify the most latent service calls
Investigate if latent calls allow for manipulation
Determine API Gateway error conditions, thresholds, and library timeouts
Tune payload to fly under WAF/Rate Limiting
Test hypothesis
Scale your test using Cloudy Kraken (orchestrator) and Repulsive Grizzly (attack
framework)
24. Example Workflow - Ordinary API Call
POST data seems to be encrypted.
Modifying POST body seems to return
fast error code
Changing I/O didn’t result in increased
latency of API calls
25. Example Workflow - Interesting API Call
Adding more items to list results in
longer response time
Adding too many items results in a
special error code (exp. 502)
Changing I/O resulted in increased
latency of API calls
37. Workflow for Identifying Application DDoS - Part 2
Identify the most latent service calls
Investigate if latent calls allow for manipulation
Determine API Gateway error conditions, thresholds, and library timeouts
Tune payload to fly under WAF/Rate Limiting
Test hypothesis
Scale your test using Cloudy Kraken (orchestrator) and Repulsive Grizzly (attack
framework)
43. Cloudy Kraken - Key Features
Fresh global fleet of test instances each run
Lots of fresh global IP addresses
Easily push new attack code and configs
Coordinates the attack timing
Safety kill-switch
45. Update Code
Push the configuration file
and attack scripts to S3.
Reset the DynamoDB state.
Build
Environment
Configure the VPCs in each
region
Start up
Attack
Nodes
Launch instances
Collect
Data
Wait for data to come
through SNS
Tear-Down
Tear down and reset the
environment in each region
Cloudy Kraken Workflow
46. We scaled up, time to run the test!
Tested against prod
Multi-region and
multi-agent
Conducted two 5
minute attacks
Monitored for success
49. So What Failed?
Expensive API calls could be invoked with non-member cookies
Expensive traffic resulted in many RPCs per request (1:7200)
WAF/Rate Limiter was unable to monitor middle tier RPCs
50. API
GATEWAY
Middle Tier Service
Middle Tier Service
Middle Tier Service
Middle Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Backend Tier Service
Edge Middle Backend
Case Study API DDoS
5. Queue continues to
fill up, takes longer to
return objects
6. Middle Tier
services
returning
5XX and 200
status codes
1. Agents cycling
through multiple
session cookies and IP
addresses
7. API gateway starts to
trigger breakers,
returning some 403’s,
503’s, 200’s, 504’s.
3. Objects not in
Cache, Cache Miss
4. About 15
seconds to
return huge
object store8. Automatic scaling of
services in progress,
but not fast enough
9. CPU Spiking in API
Gateway, Gateway
Timeouts occurring
2. Each request asking
for 7,200 expensive
calculations from
multiple backends
59. Future Work
Automated identification of potential vulnerable endpoints
Autotuning attack parameters during exercise based up on error
condition criteria
Testing common open source microservice
frameworks/libraries/gateways
62. Repulsive Grizzly: Payload and Header Files
Provide payloads in any format you want
Headers are provided as a JSON key/value hash
Use $$AUTH$$ placeholder to tell grizzly where to place tokens
63. Microservice Primer: Architecture
Scale
Service independence
Fault isolation
Eliminates stack debt
Distributed system complexity
Deployment complexity
Cascading service failures if
things aren’t set up right
GOOD BAD