Slides for a talk given at RICON West 2013.
Video is available at the very beginning of Day 1, Track 2 here:
http://ricon.io/west2013.html
Full edited videos will be available at a later date.
This talk is about how StackMob uses Riak Core for distributing services.
36. Physical Server 1
SOA on Riak Core
Request
Webmachine
proxy_coord
CC
Instance
Physical Server 2
37. Physical Server 1
SOA on Riak Core
Request
Webmachine
proxy_coord
vnode
CC
Instance
Physical Server 2
38. Physical Server 1
SOA on Riak Core
Request
Webmachine
proxy_coord
vnode
instance_mgr
CC
Instance
Physical Server 2
39. Physical Server 1
SOA on Riak Core
Request
Webmachine
proxy_coord
vnode
instance_mgr
proxy_op
CC
Instance
Physical Server 2
40. Physical Server 1
SOA on Riak Core
Request
Webmachine
proxy_coord
vnode
instance_mgr
proxy_op
CC
Instance
Physical Server 2
41. Physical Server 1
SOA on Riak Core
Request
Deploy
Webmachine
proxy_coord
deploy_coord
vnode
instance_mgr
instance_mgr
proxy_op
deploy_op
CC
Instance
CC
Instance
Physical Server 2
42. SOA on Riak Core
Request
Physical Server 1
Physical Server 2
Physical Server 3
CC
Instance
47. Riak Core - Replication
K, V
Physical Server 1
Physical Server 2
Physical Server 3
48. SOA on Riak Core - Replication
Request
Service
Instance
Service
Instance
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
49. SOA on Riak Core - Replication
Request
Service
Instance
Service
Instance
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
50. SOA on Riak Core - Replication
Request
Service
Instance
Service
Instance
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
51. SOA on Riak Core - Replication
Request
Service
Instance
Service
Instance
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
52. SOA on Riak Core - Replication
Request
Service
Instance
Service
Instance
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
53. SOA on Riak Core - Replication
Request
Service
Instance
Service
Instance
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
54. SOA on Riak Core - Replication
Request
Service
Instance
Service
Instance
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
55. SOA on Riak Core - Replication
Deploy
Physical Server 1
Physical Server 2
Physical Server 3
56. SOA on Riak Core - Replication
Deploy
Service
Instance
Service
Instance
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
57. SOA on Riak Core
• Versioning (“Consistency”)
• Availability✔
• Partition Tolerance
58. SOA on Riak Core
• Versioning (“Consistency”)
• Availability✔
• Partition Tolerance
59. SOA on Riak Core - Handoff
Service
Instance
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
60. SOA on Riak Core - Handoff
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
61. SOA on Riak Core - Handoff
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
62. SOA on Riak Core - Handoff
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
63. SOA on Riak Core - Handoff
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
64. SOA on Riak Core - Handoff
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
65. SOA on Riak Core - Handoff
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
66. SOA on Riak Core - Handoff
Service
Instance
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
67. SOA on Riak Core - Handoff
Service
Instance
Service
Instance
Physical Server 1
Physical Server 2
Physical Server 3
68. SOA on Riak Core
• Versioning (“Consistency”)
• Availability✔
• Partition Tolerance✔
69. SOA on Riak Core
• Versioning (“Consistency”)
• Availability✔
• Partition Tolerance✔
70. SOA on Riak Core - Versioning
Deploy
(v2)
Service
(v1)
Service
(v1)
Service
(v1)
71. SOA on Riak Core - Versioning
Deploy
(v2)
Service
(v1)
Service
(v1)
Service
(v1)
72. SOA on Riak Core - Versioning
/{service}/{version}/api/{endpoint}
73. SOA on Riak Core - Versioning
Request
Bastion
Service
(v1)
Service
(v1)
Service
(v1)
74. SOA on Riak Core - Versioning
Service A
(v1)
Service A
(v1)
Service B
(v1)
Service B
(v1)
75. SOA on Riak Core - Versioning
Service A
(v1)
Service A
(v1)
Service B
(v1)
Service B
(v1)
76. SOA on Riak Core - Versioning
Service A
(v1)
Service B
(v1)
Service A
(v1)
Service B
(v1)
Service A
(V2)
Service B
(V2)
Service B
(V2)
Service A
(V2)
77. SOA on Riak Core - Versioning
Service A
(v1)
Service B
(v1)
Service A
(v1)
Service B
(v1)
Service A
(V2)
Service B
(V2)
Service B
(V2)
Service A
(V2)
78. SOA on Riak Core - Versioning
Service A
(v1)
Service B
(v1)
Service A
(v1)
Service B
(v1)
Service A
(V2)
Service B
(V2)
Service B
(V2)
Service A
(V2)
79. SOA on Riak Core
• Versioning (“Consistency”) ✔?
• Availability✔
• Partition Tolerance✔
80. SOA on Riak Core - Integration
Three key features
81. SOA on Riak Core - Integration
Three key features
• Management API (Startup/Shutdown)
82. SOA on Riak Core - Integration
Three key features
• Management API (Startup/Shutdown)
• Consistent Service Responses
83. SOA on Riak Core - Integration
Three key features
• Management API (Startup/Shutdown)
• Consistent Service Responses
• Status Check Endpoint
84. SOA on Riak Core
• Riak Core can work for any distributed system
85. SOA on Riak Core
• Riak Core can work for any distributed system
• Riak Core provides:
• Availability
• Partition Tolerance
86. SOA on Riak Core
• Riak Core can work for any distributed system
• Riak Core provides:
• Availability
• Partition Tolerance
• You need external coordination (Consistency)
87. SOA on Riak Core
• Riak Core can work for any distributed system
• Riak Core provides:
• Availability
• Partition Tolerance
• You need external coordination (Consistency)
• You provide:
• Management API
• Consistent Responses
• Health Check
So versioning can be a bit of a headache. But I’m sure you all have ways to solve this within your own clusters. This is, however, a little trickier for us, and I’ll talk about that in a moment.
So we’ve got these three requirements, and we’ve got this theorem that says that we can only have two at any given time, and while that will hold true, we’re gonna talk later about a workaround we used. Now we at StackMob have had to solve these issues for all of our sercices, we’ve come up with various ways to do it, but we do have one service that makes things a little more complicated.
Call back to original slide
We needed something that can do all these things automatically. So basically, we needed a database to store our services.
For that, there’s Basho’s own Riak Core! While Riak Core is generally used for storing and manipulating data, it’s really just a library for building distributed systems.
Now, I do wanna stress that this is quite the simplification. For example, this request doesn’t strictly come into a vnode, it comes into one of the physical servers in the ring and then routes it to the appropriate vnode. But this is a convenient way to portray it and so we’ll be using this sort of diagramSo it would really look a little more like this
It’ll look a little more like this, if we’re being precise. You’ve got your physical ring, your three joined servers, and a request comes into one of those.
But we’re gonna simplify and use this as our diagram for now, where these requests come into some form of coordinator, and So riak core also gives you replication, and that’s because
That for replication, you can just walk around the ring. Now, this is a simplification and riak core works in slightly more interesting ways. But this is how riak core attempts to distribute different replicas to different physical servers. There are relatively few strict guarantees but it works pretty well!So like I said earlier, what we really need is sort of a database, but which stores services. So what if, instead of putting data in here, we put service instances?
It looks sort of like this! the request comes in, and that request uses the service id as its ‘key’. So we hash that, figure out which vnode will service that request, and send the request along to the custom code instance
So in our case specifically, we specifically put our custom code instances behind the requestLet’s look a little deeper now into how we implemented this. Let’s just take this pipeline
…And rearrange it a bit
So here’s our stack. When a req comes in…
It comes into webmachine, which serves the node’s HTTP interface.
We then create a coordinator to handle the request. In this case, we’ll create a proxy coordinator since we’re proxying a request to our custom code instance. Consistent HashingFigure out vnode
So a vnode’s just an actor, and riak core keeps track of them. So riak core has this catalog of vnodes and provides the interface we use to communicate with them.
That, in turn, is responsible ode, in for keeping track of its instances. So this vnode actor manages a bunch of instance manager actors, and those actors are in turn responsible for keeping track of the actual instance.This instance manager is responsible for whole instance lifecycle. It knows whether the instance is ready for service, etc etc. However, it doesn’t handle the request directly. Instead we create a new actor.
So, like the coordinator, the operation actor is responsible for only a single request and it handles the lifecycle of sending that request to the custom code instance. It dies after the request is done. So now, we’ve gotta get the response back to the server
To do that we end up skipping a few layers. This is mainly for performance. Since actors in erlang can only process one message at a time, we don’t want to block on these requests in the vnode and instance manager, because those actors are responsible for all our requests for that instance. Instead, we go from the op directly back to the coordinator with our response. This is what the stack looks like when you proxy a request.
Other requests work the same way. If you want to deploy a new instance, you create a coordinator, you go to the vnode, you create a new instance manager, and that instance manager starts up a deploy op actor.And all that comes together to give us this
So this is pretty simple so far, let’s start talking about what riak core gives us!
Riak core is made for building distributed systems, and you can’t have a distributed system without replication
Riak uses R/W/N
This is replication, blah blah. So like I said earlier, what we really need is sort of a database, but which stores services. So what if, instead of putting data in here, we put service instances?
This is replication, blah blah. So like I said earlier, what we really need is sort of a database, but which stores services. So what if, instead of putting data in here, we put service instances?
So this is how we ‘read’ from our service ‘database’ – we send a request to a service, and get a response back. But at the same time, we’ve detected a failure, we’ve repaired that failure, while simultaneously making the request to the next available node. All automatically.So the next question is, how do we start these services in the first place? Turns out you don’t always need to – In some cases, it’s better to just let a request come in, detect no instance, and send a repair command. That’s what we do in custom code (explain)
This might look a little confusing, but basically what you’ve got here is two separate networks. Each of them is talking a different API, they don’t have to know about each other, and you can test your new stuff in production before you ever let an outside service talk to it.