In this webinar, we review the benefits of deploying a microservices architecture with Cassandra as your backbone in order to ensure your applications become incredibly reliable. We discuss in detail:
- How to create microservices in Node.js with ExpressJs and Seneca
- Tuning the Node.js driver for Cassandra: error handling, load balancing and degrees of parallelism
- Additional best practices to ensure your systems are highly performant and available
The sample service is available on GitHub: https://github.com/jorgebay/killr-service
These are the high-level topics we'll be getting into:
- We start with an introduction of Microservices, usage, common patterns.
- Second, we continue with a case why Node.js and Apache Cassandra, and what makes them an excellent choice for Microservices.
- Then, we will talk about the Node.js driver: tuning, best practices and some of the driver internals.
- Finally we will put it all together in a demo, called killr-service, based on a killr-video schema used by Patrick McFadin to demonstrate data modeling technique. With this project we will try to cover what we discussed on the previous topics.
Let's start with a brief theory microservices, there is a lot of buzz about microservices but there is no precise definition about it.
There are certain characteristics that define microservices.
Besides a hyped word, microservices is an architectural approach to develop applications using a suite of services.
Each service is a component that is developed around a functionality, a unit of software that can be independently replaceable and upgradeable.
Each service runs in its own process.
Each service should have a different data storage. Each service generates events, that can be picked up by another microservices to perform specific actions based on it.
In the following slides, we are going to describe the reasons behind this pattern.
It's hard to talk about microservices without talking about where we all came from.
We are used to build monolithic applications as a single unit, where all the logic for handling user requests go through a single process.
A system with multiple software components, components that depend on each other, is relatively easy to develop and deploy. You just end up with 1 package (jar).
But monoliths have their problems, they tend to grow big, making it hard to implement new features without a deep knowledge of the internals.
Overtime, it starts to be difficult to understand how to correctly implement a change and the quality of the code declines.
To add functionality for which your application was not originally designed, you are sometimes forced to "hack" into the code base.
Additionally, a monolith means a long term commitment to a technology stack, making it hard to benefit from newer technology.
This pattern allows cross functional teams to be responsible of the whole life cycle of a component, organized around a business capability.
Keeping things that can change at the same time in the same module, enable us to quickly implement new changes and make the system evolve faster.
That's is the reason for the microservice to have their own data, to manage schema changes independently.
It is even OK to have multiple versions of the same service, online.
No single point of failure, if one of the services goes down, the application can remain operative.
As a microservice is loosely coupled with others, it allows teams to choose the technology stack (from the programming language to the database).
Microservices architecture makes sense under the premise that system are long lived but versions of the services are short lived.
But it is not all good news, some parts of your development life cycle do get harder.
Now you have to deal with inter-service communication, each service is a different process and probably is located in a different server, so there should be a network protocol to communicate.
The microservices architecture replaces 1 monolithic application but there are multiple services per application, making deployment not trivial.
Let's look at a basic design of a monolithic architecture of server application.
You can use layers to allocate different responsibilities of the application, typically:
- Presentation layer
- Service layer
- Data access layer
Each layer has different components.
Components can call each other and boundaries are not defined in the design.
For example the order service can call the products, order and customer data access to process an order.
Now, lets consider the microservice architecture, this looks simple enough.
Components are functionally decomposed. Each service contains the logic to deal with an specific functionality.
Each service with has its own data storage, you can even denote the service as the group containing the package and the database.
Lets focus now in 2 of the services, Orders and Products.
Keeping in mind that there can be multiple instances of each service running in production.
If we start making direct calls from one service to the other we end up with the same problems as the monolithic pattern, plus we must implement the logic for handling other services going down, plus knowing details of other services like address / contracts.
[Next] That is why we should introduce a message broker.
[Next][Next] That deals with messages coming from one service and delivers them asynchronously.
About communication between services, there are a few principles that we can follow.
We publish and consume events, for example: order made, a product has been restocked.
We just have to worry about the event being received by the broker, not about what other services have to do with it.
Start simple and cheap, it doesn't have to be a fancy enterprise service bus, think of it as a dumb pipe.
Message queue protocol is a good fit, products like RabbitMQ/ActiveMQ are fully featured.
Now, lets look at the complete example, focused on the Order service.
The order service package communicates with a database and the message broker.
The order service publish events and consume events from other services or systems.
This pattern allow us to organize software components around business capabilities.
We can deploy new versions of a service independently.
Owning the data allow us to implement schema changes knowing that we won't affect other services.
[Long Pause]
What we will see in this small section is why Node.js and Cassandra are a great fit for Microservices.
It will be very short because I don't want to make this presentation to turn into a Marketing or Sales presentation...
Node.js is really good for IO Bound scenarios.
There is one or few ways to implement a functionality, for example most IO libraries do not include synchronous method, just async.
Node.js performs very well for high number concurrent connections, where most of the time is spent waiting for IO.
Well, and with Cassandra:
You get tunable consistency, for any given read or write operation, the we can decide how consistent the requested data must be.
The peer-to-peer architecture, which can include several identical nodes, protects us from data loss.
It's not just a key value store, via cql, it provides tabular outputs and a rich type system, including maps, lists, sets and user defined types.
[Long Pause]
Let’s have a quick look at the Node.js driver before trying to build our own microservice demo.
The Node.js driver provides:
- Connection pooling
- Automatic failover and retry
- The driver discovers the Cassandra cluster topology automatically.
- Tunable load balancing, retry and reconnection policies.
- Request pipelining, meaning that you can issue more than multiple requests without waiting for a response.
- Client-server SSL support.
I tried to illustrate how automatic failover works.
[Next] If a Cassandra node fails or becomes unreachable
[Next] the driver automatically tries other nodes in the cluster and schedules reconnections to the dead nodes in the background.
It can try with another node in the same datacenter.
[Next] Or, In the case the local datacenter becomes unavailable
[Next] it will issue the request on a node of another datacenter.
The Cassandra protocol is a request-response based communication mechanism but it also features notification of events.
These events can be:
- A new node being added.
- An existing node being moved or removed
- A change in the schema.
- Or a node status change: a node went DOWN or back UP.
To discover the nodes that are part of the cluster, the driver initially fetches the topology information and then uses this notification mechanism to keep the cluster information up to date.
In the driver, you can select the load balancing policies, to define which node should be the coordinator of each query.
The driver has 3 built-in load balancing policies:
- Round robin policy
- Datacenter Aware policy.
- Token Aware policy.
You can also create your own load balancing policy.
With the retry policy, you can choose when a query should be retried when an specific error occurred.
With the reconnection policy, you can define what should be the delay for the reconnection attempts, in case a node goes down.
Here is a code sample of a load balancing policy.
I don't want to dig deep into it, as you probably would not need to implement your own policy (built-in ones are suitable for most cases).
But it's just a way to show you that, with a couple of lines of code, you can override the behaviour of the driver.
In this case, by inheriting from LoadBalancingPolicy base class and yielding the hosts, you are able to build your own policy.
Do you fancy Ecmascript 6?
The driver enables you to use the latest ES6 features.
For the load balancing policies, you can use the generators (using yield keyword), as it uses the iterator protocol.
For encoding and decoding Cassandra maps and sets, you can configure the driver to use Ecmascript 6 Maps and Sets.
Cassandra native protocol supports multiple requests to be sent without waiting for a response. This enables higher levels of parallelism with just one connection.
What the graph is trying to show is that, thanks to request pipelining, the driver does not have to wait for the response to be received to issue the following requests.
Other database protocols that don't support request pipelining (I don't want to name names), force clients to maintain lots of connections to the same host to achieve the same level of concurrency.
The driver implements failover and retry, so you don't have to implement your own failover / retry functionality.
The driver uses connection pooling to the Cassandra nodes, so you should reuse the same client instance and don't worry about it.
The driver supports fine tuning through policies and setting, but the default policies and configuration settings are suitable for most of the use cases, so you don't need to spend time on this.
[Long pause]
Well let's try to build a demo of a microservice with Node.js and Cassandra.
The service will focus in user feedback of videos (comments and ratings).
It is based on a sample schema for a video sharing site that Patrick McFadin has been using to demonstrate data modeling techniques with Cassandra.
There are other sample applications using this schema, most notably there is a live site built by Luke Tillman from DataStax: killrvideo.com
And here it comes, a lot of code in presentation slides, ...
We will be using these two column families for comments, one partitioned by videoId and the other by user.
[Long Pause]
And we will be using these two column families for ratings, the
To keep the presentation short, I will be showing the code for the comments functionality and not the ratings but on GitHub you have all the functionality if you want to look into it.
We will be using HTTP and just GET and POST verbs.
We will be implement it with ExpressJS but I will also provide samples with Seneca.
Let's start with a simple one:
Expose a GET route to get all the comments and return them as JSON.
We reuse the same cassandra client instance and execute using different parameters.
Well, using the repository pattern, we moved all the logic of the data accessing and adapting of results to the repository class.
We set the cassandra driver instance as a dependency to the repository class.
Now, we have a cleaner code there. I made it as a personal choice to provide more concise code snippets, but we could have it all in just one module as there are very few lines of code.
Now, lets looks at a more complete example.
The routing part is very similar, we route POST requests, insert the comment and return the id of the comment.
All the logic is on the repository.
As we have a denormalized schema, we have to make 2 inserts.
We execute them in a single batch, that translates into a single request to Cassandra.
Let's complete the example of the insertion of a comment pushing a notification to the BUS, the message broker.
Consumers of that event, could perform specific actions with that message:
An example would be a user profile service that outputs some stats related to comments cleaning its internal cache.
An example would be a service cache being cleaned before a bit of content that needs to be rendered in a webpage, in this case a new comment, changed.
The internal implementation of a message broker client is out of scope of this presentation.
But let's look into in what it should expose.
It should expose typed methods to publish events, like the one we invoked from the repository.
It should emit events based on server events.
In this example, I'm using an imaginary library, but the idea would be to subscribe to a topic and emit an client event of that.
Listeners of this event within this service could perform specific actions out of it.
So that is the complete example of a service bus client class, there's a lack of real code in this example but hope it helps to build your own.
Just as a quick note, here is the same action to retrieve comments but using SenecaJs.
Seneca is a tool that separates an application into small actions.
An action is not aware if it is being exposed through HTTP or any other protocol.
Let's recap
It can be traversed, the 3 classes can be setup, comment and rating.
It's around 150 lines of code, it can be less without code comments and a less verbose programming style.
Around a functionality… it can even splitted up into comment service and rating service.
And that's about it, here is the link to the sample project, killr-service.
Check out the DataStax dev blog, where we publish news about the Cassandra and it's drivers.