2. @chbatey
Who is this guy?
● Enthusiastic nerd
● Senior software engineer at BSkyB
● Builds a lot of distributed applications
● Apache Cassandra MVP
3. @chbatey
Agenda
1. Setting the scene
○ What do we mean by a fault?
○ What is a microservice?
○ Monolith application vs the micro(ish) service
2. A worked example
○ Identify an issue
○ Reproduce/test it
○ Show how to deal with the issue
6. But different things go wrong...
@chbatey
down
slow network
slow app
2 second max
GC :(
missing packets
7. Fault tolerance
1. Don’t take forever - Timeouts
2. Don’t try if you can’t succeed
3. Fail gracefully
4. Know if it’s your fault
5. Don’t whack a dead horse
6. Turn broken stuff off
@chbatey
8. Time for an example...
● All examples are on github
● Technologies used:
@chbatey
○ Dropwizard
○ Spring Boot
○ Wiremock
○ Hystrix
○ Graphite
○ Saboteur
9. Example: Movie player service
@chbatey
Shiny App
User
Service
Device
Service
Pin
Service
Shiny App
Shiny App
Shiny App
User
Se rUvisceer
Service
Device
Service
Play Movie
19. Wiremock + Saboteur + Vagrant
● Vagrant - launches + provisions local VMs
● Saboteur - uses tc, iptables to simulate
@chbatey
network issues
● Wiremock - used to mock HTTP
dependencies
● Cucumber - acceptance tests
20. I can write an automated test for that?
@chbatey
Vagrant + Virtual box VM
Wiremock
User Service
Device Service
Pin Service
Sabot
eur
Play
Movie
Service
Acceptance
Test
prime to drop traffic
reset
23. Implementing reliable timeouts
● Homemade: Worker Queue + Thread pool
@chbatey
(executor)
● Hystrix
● Spring Cloud Netflix
24. A simple Spring RestController
@chbatey
@RestController
public class Resource {
private static final Logger LOGGER = LoggerFactory.getLogger(Resource.class);
@Autowired
private ScaryDependency scaryDependency;
@RequestMapping("/scary")
public String callTheScaryDependency() {
LOGGER.info("RestContoller: I wonder which thread I am on!");
return scaryDependency.getScaryString();
}
}
25. Scary dependency
@chbatey
@Component
public class ScaryDependency {
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class);
public String getScaryString() {
LOGGER.info("Scary dependency: I wonder which thread I am on!");
if (System.currentTimeMillis() % 2 == 0) {
return "Scary String";
} else {
Thread.sleep(10000);
return "Really slow scary string"; }
}
}
26. All on the tomcat thread
13:07:32.814 [http-nio-8080-exec-1] INFO info.batey.
examples.Resource - RestContoller: I wonder which thread
I am on!
13:07:32.896 [http-nio-8080-exec-1] INFO info.batey.
examples.ScaryDependency - Scary dependency: I wonder
which thread I am on!
@chbatey
27. Seriously this simple now?
@chbatey
@Component
public class ScaryDependency {
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class);
@HystrixCommand
public String getScaryString() {
LOGGER.info("Scary dependency: I wonder which thread I am on!");
if (System.currentTimeMillis() % 2 == 0) {
return "Scary String";
} else {
Thread.sleep(10000);
return "Really slow scary string";
}
}
}
28. What an annotation can do...
13:07:32.814 [http-nio-8080-exec-1] INFO info.batey.
examples.Resource - RestController: I wonder which
thread I am on!
13:07:32.896 [hystrix-ScaryDependency-1] INFO info.
batey.examples.ScaryDependency - Scary Dependency: I
wonder which thread I am on!
@chbatey
29. Timeouts take home
● You can’t use network level timeouts for
@chbatey
SLAs
● Test your SLAs - if someone says you can’t,
hit them with a stick
● Scary things happen without network issues
31. Complexity
● When an application grows in complexity it
will eventually start sending emails
@chbatey
32. Complexity
● When an application grows in complexity it
will eventually start sending emails contain
queues and thread pools
@chbatey
33. Don’t try if you can’t succeed
● Executor Unbounded queues :(
○ newFixedThreadPool
○ newSingleThreadExecutor
○ newThreadCachedThreadPool
● Bound your queues and threads
● Fail quickly when the queue /
@chbatey
maxPoolSize is met
● Know your drivers
34. This is a functional requirement
● Set the timeout very high
● Use wiremock to add a large delay to the
@chbatey
requests
● Set queue size and thread pool size to 1
● Send in 2 requests to use the thread and fill
the queue
● What happens on the 3rd request?
42. Separate resource pools
● Don’t flood your dependencies
● Be able to answer the questions:
○ How many connections will
you make to dependency X?
○ Are you getting close to your
@chbatey
max connections?
43. So easy with Dropwizard + Hystrix
@Override
public void initialize(Bootstrap<AppConfig> appConfigBootstrap) {
HystrixCodaHaleMetricsPublisher metricsPublisher
= new HystrixCodaHaleMetricsPublisher(appConfigBootstrap.getMetricRegistry())
HystrixPlugins.getInstance().registerMetricsPublisher(metricsPublisher);
@chbatey
}
metrics:
reporters:
- type: graphite
host: 192.168.10.120
port: 2003
prefix: shiny_app
44. 5 - Don’t whack a dead horse
@chbatey
Shiny App
User
Service
Device
Service
Pin
Service
Shiny App
Shiny App
Shiny App
User
Se rUvisceer
Service
Device
Service
Play Movie
45. What to do..
● Yes this will happen..
● Mandatory dependency - fail *really* fast
● Throttling
● Fallbacks
@chbatey
47. Implementation with Hystrix
@chbatey
@GET
@Timed
public String integrate() {
LOGGER.info("I best do some integration!");
String user = new UserServiceDependency(userService).execute();
String device = new DeviceServiceDependency(deviceService).execute();
Boolean pinCheck = new PinCheckDependency(pinService).execute();
return String.format("[User info: %s] n[Device info: %s] n[Pin check: %s] n", user, device,
pinCheck);
}
48. Implementation with Hystrix
public class PinCheckDependency extends HystrixCommand<Boolean> {
@chbatey
@Override
protected Boolean run() throws Exception {
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck");
HttpResponse pinCheckResponse = httpClient.execute(pinCheck);
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity());
return Boolean.valueOf(pinCheckInfo);
}
}
49. Implementation with Hystrix
public class PinCheckDependency extends HystrixCommand<Boolean> {
@chbatey
@Override
protected Boolean run() throws Exception {
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck");
HttpResponse pinCheckResponse = httpClient.execute(pinCheck);
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity());
return Boolean.valueOf(pinCheckInfo);
}
@Override
public Boolean getFallback() {
return true;
}
}
50. Triggering the fallback
● Error threshold percentage
● Bucket of time for the percentage
● Minimum number of requests to trigger
● Time before trying a request again
● Disable
● Per instance statistics
@chbatey
51. 6 - Turn off broken stuff
● The kill switch
@chbatey
52. To recap
1. Don’t take forever - Timeouts
2. Don’t try if you can’t succeed
3. Fail gracefully
4. Know if it’s your fault
5. Don’t whack a dead horse
6. Turn broken stuff off
@chbatey
57. Hystrix metrics
● Failure count
● Percentiles from Hystrix
@chbatey
point of view
● Error percentages
58. How to test metric publishing?
● Stub out graphite and verify calls?
● Programmatically call graphite and verify
@chbatey
numbers?
● Make metrics + logs part of the story demo