Christopher Batey presents on building fault tolerant microservices on the JVM. He discusses common faults that can occur, such as timeouts, failed dependencies, and circuit breaking. He demonstrates how to test for these faults using Wiremock and Saboteur. Batey also shows how to implement fault tolerance techniques like timeouts, queueing, and circuit breaking using libraries like Hystrix. The presentation concludes with best practices for monitoring systems and implementing fallback logic and kill switches for failed components.
2. @chbatey
Who am I?
• DataStax
- Technical Evangelist / Software Engineer
- Builds enterprise ready version of Apache Cassandra
• Sky: Building next generation Internet TV platform
• Lots of time working on a test double for Apache Cassandra
3. @chbatey
Agenda
•Setting the scene
-What do we mean by a fault?
-What is a micro(ish)service?
-Monolith application vs the micro(ish)service
•A worked example
-Identify an issue
-Reproduce/test it
-Show how to deal with the issue
9. @chbatey
Small horizontal scalable services
• Move to small services independently deployed
- Login service
- Device service
- etc
• Move to a horizontally scalable Database that can run active
active in multiple data centres
17. @chbatey
Fault tolerance
1.Don’t take forever - Timeouts
2.Don’t try if you can’t succeed
3.Fail gracefully
4.Know if it’s your fault
5.Don’t whack a dead horse
6.Turn broken stuff off
18. @chbatey
1 - Don’t take forever
• If at first you don’t succeed, don’t take forever to tell
someone
• Timeout and fail fast
26. @chbatey
Adding a automated test
•Vagrant - launches + provisions localVMs
•Saboteur - uses tc, iptables to simulate network issues
•Wiremock - used to mock HTTP dependencies
•Cucumber - acceptance tests
27. @chbatey
I can write an automated test for that?
Wiremock:
•User Service
•Device Service
•Pin Service
S
a
b
o
t
e
u
r
Vagrant + Virtual box VM
Movie
Service
Acceptance
prime to drop traffic
reset
31. @chbatey
A simple Spring RestController
@RestController
public class Resource {
private static final Logger LOGGER = LoggerFactory.getLogger(Resource.class);
@Autowired
private ScaryDependency scaryDependency;
@RequestMapping("/scary")
public String callTheScaryDependency() {
LOGGER.info("Resource later: I wonder which thread I am on!");
return scaryDependency.getScaryString();
}
}
32. @chbatey
Scary dependency
@Component
public class ScaryDependency {
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class);
public String getScaryString() {
LOGGER.info("Scary Dependency: I wonder which thread I am on! Tomcats?”);
if (System.currentTimeMillis() % 2 == 0) {
return "Scary String";
} else {
Thread.sleep(5000)
return “Slow Scary String";
}
}
}
33. @chbatey
All on the tomcat thread
13:47:20.200 [http-8080-exec-1] INFO info.batey.examples.Resource -
Resource later: I wonder which thread I am on!
13:47:20.200 [http-8080-exec-1] INFO info.batey.examples.ScaryDependency
- Scary Dependency: I wonder which thread I am on! Tomcats?
34. @chbatey
Scary dependency
@Component
public class ScaryDependency {
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class);
@HystrixCommand()
public String getScaryString() {
LOGGER.info("Scary Dependency: I wonder which thread I am on! Tomcats?”);
if (System.currentTimeMillis() % 2 == 0) {
return "Scary String";
} else {
Thread.sleep(5000)
return “Slow Scary String";
}
}
}
35. @chbatey
What an annotation can do...
13:51:21.513 [http-8080-exec-1] INFO info.batey.examples.Resource - Resource
later: I wonder which thread I am on!
13:51:21.614 [hystrix-ScaryDependency-1] INFO info.batey.examples.ScaryDependency
- Scary Dependency: I wonder which thread I am on! Tomcats? :P
37. @chbatey
Timeouts take home
• You can’t use network level timeouts for SLAs
• Test your SLAs - if someone says you can’t, hit them with a
stick
• Scary things happen without network issues
38. @chbatey
Fault tolerance
1.Don’t take forever - Timeouts
2.Don’t try if you can’t succeed
3.Fail gracefully
4.Know if it’s your fault
5.Don’t whack a dead horse
6.Turn broken stuff off
43. @chbatey
Don’t try if you can’t succeed
• Executor Unbounded queues :(
- newFixedThreadPool
- newSingleThreadExecutor
- newThreadCachedThreadPool
• Bound your queues and threads
• Fail quickly when the queue / maxPoolSize is met
• Know your drivers
44. @chbatey
This is a functional requirement
• Set the timeout very high
• Use Wiremock to add a large delay to the requests
45. @chbatey
This is a functional requirement
• Set the timeout very high
• Use Wiremock to add a large delay to the requests
• Set queue size and thread pool size to 1
• Send in 2 requests to use the thread and fill the queue
• What happens on the 3rd request?
46. @chbatey
Fault tolerance
1.Don’t take forever - Timeouts
2.Don’t try if you can’t succeed
3.Fail gracefully
4.Know if it’s your fault
5.Don’t whack a dead horse
6.Turn broken stuff off
51. @chbatey
Fault tolerance
1.Don’t take forever - Timeouts
2.Don’t try if you can’t succeed
3.Fail gracefully
4.Know if it’s your fault
5.Don’t whack a dead horse
6.Turn broken stuff off
57. @chbatey
Separate resource pools
• Don’t flood your dependencies
• Be able to answer the questions:
- How many connections will you make to
dependency X?
- Are you getting close to your max
connections?
59. @chbatey
Fault tolerance
1.Don’t take forever - Timeouts
2.Don’t try if you can’t succeed
3.Fail gracefully
4.Know if it’s your fault
5.Don’t whack a dead horse
6.Turn broken stuff off
60. @chbatey
5 - Don’t whack a dead horse
Movie Player
User
Service
Device
Service
Play Movie
Pin
Service
61. @chbatey
What to do…
• Yes this will happen…
• Mandatory dependency - fail *really* fast
• Throttling
• Fallbacks
63. @chbatey
Implementation with Hystrix
@Path("integrate")
public class IntegrationResource {
private static final Logger LOGGER = LoggerFactory.getLogger(IntegrationResource.class);
@GET
@Timed
public String integrate() {
LOGGER.info("integrate");
String user = new UserServiceDependency(userService).execute();
String device = new DeviceServiceDependency(deviceService).execute();
Boolean pinCheck = new PinCheckDependency(pinService).execute();
return String.format("[User info: %s] n[Device info: %s] n[Pin check: %s] n", user, device,
pinCheck);
}
}
64. @chbatey
Implementation with Hystrix
public class PinCheckDependency extends HystrixCommand<Boolean> {
private HttpClient httpClient;
public PinCheckDependency(HttpClient httpClient) {
super(HystrixCommandGroupKey.Factory.asKey("PinCheckService"));
this.httpClient = httpClient;
}
@Override
protected Boolean run() throws Exception {
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck");
HttpResponse pinCheckResponse = httpClient.execute(pinCheck);
int statusCode = pinCheckResponse.getStatusLine().getStatusCode();
if (statusCode != 200) {
throw new RuntimeException("Oh dear no pin check, status code " + statusCode);
}
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity());
return Boolean.valueOf(pinCheckInfo);
}
}
65. @chbatey
Implementation with Hystrix
public class PinCheckDependency extends HystrixCommand<Boolean> {
private HttpClient httpClient;
public PinCheckDependency(HttpClient httpClient) {
super(HystrixCommandGroupKey.Factory.asKey("PinCheckService"));
this.httpClient = httpClient;
}
@Override
protected Boolean run() throws Exception {
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck");
HttpResponse pinCheckResponse = httpClient.execute(pinCheck);
int statusCode = pinCheckResponse.getStatusLine().getStatusCode();
if (statusCode != 200) {
throw new RuntimeException("Oh dear no pin check, status code " + statusCode);
}
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity());
return Boolean.valueOf(pinCheckInfo);
}
@Override
public Boolean getFallback() {
return true;
}
}
66. @chbatey
Triggering the fallback
• Error threshold percentage
• Bucket of time for the percentage
• Minimum number of requests to trigger
• Time before trying a request again
• Disable
• Per instance statistics
67. @chbatey
Fault tolerance
1.Don’t take forever - Timeouts
2.Don’t try if you can’t succeed
3.Fail gracefully
4.Know if it’s your fault
5.Don’t whack a dead horse
6.Turn broken stuff off
69. @chbatey
To recap
1.Don’t take forever - Timeouts
2.Don’t try if you can’t succeed
3.Fail gracefully
4.Know if it’s your fault
5.Don’t whack a dead horse
6.Turn broken stuff off