This is the story of LinkedIn's journey from a monolithic web application to a microservice based architecture, some of the challenges they faced along the way, and the tools they built to make this transition possible, including the Rest.li and Deco frameworks.
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
From a monolith to microservices + REST: The evolution of LinkedIn's architecture
1. From a monolith to
microservices + REST
The evolution of LinkedIn’s service architecture
by Steven Ihde and Karan Parikh (LinkedIn)
2.
3.
4. Leo
● Our original codebase
● Java, Servlets, JSP, JDBC
Leo
Oracle
4
5. Remote Graph
● Graph: member-to-member connection
graph
● Complex graph traversal problems not
suited to SQL queries
● RPC was used to keep it separate from Leo
● Our first service
5
7. Mid/Back Tier Services
● “Back” tier services encapsulate data
domains
● “Mid” tier services provide business logic
● We applied the service pattern to many
domains, e.g. member profiles, job postings,
group postings
7
8. Front Tier Services
● “Front” tier services aggregate data from
many domains
● Transform the data through templates to
present to the client
● Should be stateless for scaling purposes
8
9. Service Explosion
● Over 100 services by 2010
● Most new development occurring in
services, not Leo
● Site release every two weeks
9
10. Architectural Challenges
● Test failures
● Incompatibilities
● Complex orchestration
● Rollback difficult or impossible
● Complex dependencies between services
10
11. Microservices?
● Services were fine grained
● But monolithic build and release process
did not allow us to realize the benefits of
microservice architecture
11
12. Solutions
● Continuous delivery
● Break apart the code base
● Devolution of control
● Strict backwards compatibility
● Better defined boundaries between tiers
12
13. Continuous Delivery
● Shared trunk
● Pre- and post-commit automated testing
● Easy promotion of builds to production
environment
13
14. Decentralize Codebase
● Separate, independently buildable
repositories
● Shared trunk within each repository
● Versioned binary dependencies between
repositories
14
15. Devolution of Control
● Service owners control release schedule,
release criteria
● Service owners are responsible for
backwards compatibility
● Services must release independently
15
16. Backwards compatibility
● Insulates teams from each other at runtime
● Allows service owners to deploy on their
own schedule without impacting clients
16
17. Boundaries Between Tiers
● Limit aggregation to the front tier
● Limit crosstalk in the back tier:
“superblocks”
17
19. Java RPC
● Difficult to maintain backwards compatibility
● Verb-centric APIs
● Use case specific APIs
● Difficult to navigate the proliferation of
APIs
19
21. What is Rest.li?
“Rest.li is an open source REST framework
for building robust, scalable RESTful
architectures using type-safe bindings and
asynchronous, non-blocking I/O.”
Primarily JSON over HTTP.
21
23. The Rest.li stack
Rest.li Data layer and RESTful operations
D2 Dynamic discovery and load balancing
R2 Network Communication
23
24. Request Response (R2)
● REST abstraction that can send messages
over any application layer protocol (HTTP,
PRPC (old custom LinkedIn protocol))
● Client - fully asynchronous Netty
● Server - Jetty, Netty (experimental)
Rest.li
D2
R2
24
25. Dynamic Discovery (D2)
● Apache ZooKeeper
● Dynamic server discovery
● Client side software load balancing
● D2 service
Rest.li
D2
R2
25
26. Rest.li
● Data using PDSCs (Pegasus Data
Schemas)
● RESTful API that developers use to build
services
● CRUD + finders + actions
● API and data backwards
compatibility checking
Rest.li
D2
R2
26
29. Aside: Normalized Domain Models
● Links over inclusion (denormalization)
● URNs are fully qualified
foreign keys
InfluencerPost
Long id
String title
String content
URN author
Member
Long id
String firstName
String lastName
String summary
URN company
29
30. What is Deco?
● URN resolution library
● What data you want, not how you want it
Time
1
2
3
30
31. Deco Example: Influencer Post
Dummy Post
by Karan Parikh
at LinkedIn
Hi QCon!
/influencerPosts
/profiles
/companies
/influencerPosts
31
35. How Rest.li enables Microservices
● Rest.li + D2 facilitate domain specific
services
● Services can easily configure clients via D2
● D2 helps us scale the architecture
35
36. How Deco enables Microservices
● Deals with service explosion
● Abstracts away services from clients
36
37. Challenges
● Coordinating a massive engineering effort.
(LiX to the rescue!)
● Ensuring uniform RESTful interfaces
● Performance
37 Rest.li API Hub
38. Wins
● All languages talk to the same service
● Developer productivity
● Reduction of hardware load balancers
● Ability to expose APIs directly to third
parties
38
Databus (http://data.linkedin.com/projects/databus) is LinkedIn’s database change capture system
dependencies due to fuzzy boundaries between tiers
Fowler: “bare minimum of centralized management of these services”
Discussed extensively in Jason Toy’s talk
Shared trunk is good, but need to provide some decoupling at the source code level
The second part of this talk is going to be about how we solved some of the problems Steve mentioned, and how Rest.li and Deco power our current microservice based architecture here at LinkedIn.
Rest.li is LinkedIn’s REST communication framework. It was created by LinkedIn and open sourced in 2012. It allows you to build rich, expressive RESTful services and gives you type-safe bindings to call these services using async, non-blocking I/O. Primarily JSON over HTTP.
Polyglot - We initially started as a Java company but as we grew and acquired more companies more languages were added to our stack. Java, Scala, Python, Node.js (mobile). We wanted a service communication framework that would enable us to do cross language communication easily. Since Rest.li is JSON over HTTP at its core this is pretty easy to do. It frees us from having to setup proxy layers converting Java objects into other consumable forms.
REST - We wanted a service communication framework that prevents engineers from having to learn how to interact with each new service that they have to talk to and ensures uniformity across service interfaces. With Rest.li all communication is RESTful so if you know how to talk to one service you now how to talk to all of them. Lots of tooling work has also been done to ensure this uniformity.
When people refer to Rest.li they are typically talking about all three in combination. However, Rest.li only explicitly depends no R2. D2 is optional but almost all our production services use all three.
R2 is the later responsible for communicating over the network. REST abstraction over an application layer protocol.
D2 is our dynamic discovery layer built on top of Apache ZooKeeper. By dynamic discovery I mean that clients do not have to hardcode server URLs in their code. Clients simply need to specify what service or resource they want to talk to and D2 converts that name into a URL. At the time of doing this name to URL translation it also applies client side software load balancing.
The main concept in D2 is that of a D2 service. A D2 service represents a logical network API. It doesn’t necessarily have to be a REST service, but a large majority of our D2 services map to Rest.li RESTful resources.
Most application developers only interact with this layer. This is the layer that has APIs for modelling the data your resource will serve, as well as the RESTful methods it will expose. The data + methods together comprise a Rest.li resource. In our DCs a Rest.li resource typically maps to a D2 service.
Data is defined using the Pegasus Data Schema format, with a syntax derived from Avro.
Finders - way to model arbitrary queries on a set of entities via query params.
Actions - arbitrary RPC when you cannot model something RESTfully.
Rest.li also features API and data backwards compatibiity checking built it. If the framework detects a backwards incompatible change it will fail the build and make the developer aware of how they’ve made a backwards breaking change.
To understand why we need deco we need to understand how we model the entities served by Rest.li resources.
All domain models at LinkedIn are normalized. That means that if we have one domain model that needs information from another domain model we model this using a link from one domain to the other. This is done via URNs. A URN is a fully qualified foreign key. Just by looking at a URN we can figure out which domain it is referencing.
Using links over inclusion gives clients the power to request whatever data they want from the linked domain model. In this example the InfluencerPost model simply links to the Member model that is the author for the particular post.
Deco follows URNs and fetches the requested data from the other domain model. You simply tell deco what data you want from the linked domain model, and not how you want it. Deco uses the URN to fetch requested data from the appropriate service.
Very useful in situations like that shown in the diagram in which the client needs to talk to multiple services, following links from each. This is commonly seen in our frontends which aggregate information from multiple sources. In the diagram the green service needs to call 3 different services, each time using an URN returned by the previous one to call the next one. Deco is built precisely to solve this problem and make it easy for application developers to aggregate data from many services.
The frontend will display the post title, the author of the post, the company they currently work at, and the body of the post. On the right hand side we have the 3 services that data will be fetched from.
The first thing we do is make a GET request to the /influencerPosts Rest.li resource and request the post with ID 123. We use a projection to specify what fields we want from the requested domain model. In this example we want the title, content, and the author. The next thing we do is RESOLVE the author URN and request the firstName, lastName, and company URN from the member domain. Finally, we RESOLVE the company URN and request the company name. Deco is the library responsible for doing the URN resolution, which happens twice in this example.
How does an application developer know that the member domain model has fields called firstName, lastName, and a company URN? We have an internal tool called the Rest.li API Hub that documents all our Rest.li resources and the models that they pertain to. Using this engineers can easily see what data they will get back. We’ve open sourced the Rest.li API Hub as well and I will be talking about it in the next few slides.
The previous example shows a scenario where a library like deco really shines. As a client we had to aggregate data from three separate services, but we didn’t have to write a lot of code to do so since deco took care of it for us.
Rest.li and D2 have really helped in LinkedIn’s transition towards a microservice architecture. Rest.li allows us to write rich, domain-specific RESTful resources, and by mapping these RESTful resources to D2 services, they can be consumed by clients very easily.
Services can easily configure service properties like HTTP timeout, HTTP compression, max response size etc. and have this configuration pushed out automatically to all their clients via ZooKeeper.
D2 has really helped us scale our microservice based architecture as well. Apart from the dynamic discovery property I mentioned, by using client side software load balancing and client side degradation we have made our architecture more robust to failures.
Deco helps us mitigate the problem of service explosion that one might see in a microservice based architecture. Deco makes clients powerful w/o making them fat clients by introducing the deco DSL. It deals with the problem of clients to potentially have to talk to dozens of services to display information to the user. By abstracting away the services and using URNs, deco allows us to keep our client code simple while still having the power of a microservice based architecture. It allows us to build fine grained services w/o having to write lots of code on the client side to use the information.
Massive engineering effort - took around three years to convert our RPC based stack to use Rest.li and this effort involved hundreds of software engineeres and site reliability engineers. Had to do this transition without affecting our planned product roadmap and without causing site issues and outages. LiX, our A/B testing framework, was very useful in the rollout as we could use it to control how much traffic was being sent to a Rest.li based service v.s. a RPC based service for the same domain. You can read more about LiX on our Engineering Blog.
Uniform RESTful interfaces - Our engineers were used to thinking in terms of fine grained RPC interfaces and we had to get them comfortable thinking about generic RESTful APIs. Even though Rest.li is very opinionated about REST there are still certain patterns, design decisions, and naming conventions that we wanted to try and use uniformly across the company. For instance, if the client is assigning a key during entity creation we model it as an UPDATE or PUT, rather than a CREATE or a POST. To do this we tried to make sure our documentation was very thorough. We also established a REST model review team that looked at teams REST models and APIs once a week to make sure that standards were being adhered to, our data models were clean and consistent and conventions were being followed. Also created a set of common domain models that can be used across the companies. These include models for time ranges, URNs, phone numbers etc. Rest.li API Hub was another thing that was very useful. By creating a directory of all the Rest.li APIs within LinkedIn it allows developers to easily find which service they need to talk to for a particular domain, how to model their entities, how to design their REST API etc.
Performance - JSON serialization worked well for a majority of our services. But for certain high performance extremely low latency systems we’ve explored other serialization options as well.
Regardless of language the client is written in they can all talk to the same Rest.li service, since Rest.li is based on REST and JSON.
Productivity - because of uniform REST APIs and tooling support if you know how to talk to one Rest.li service you know how to talk to all of them. Developers don’t have to learn a new API for each service that they interact with. Both Rest.li and the REST mode review committee helpd ensure this uniformity.
Reduction of lead balancers - apart from an extra layer of maintenance and the cost associated with them these hardware load balancers were also a single point of failure. By distributing the load balancing logic among the clients we make our architecture more resilient to failures. Easier to do cross-colo calls and DC failovers.
In our previous architecture we had a separate service that exposed public RESTful endpoints that third parties would call to access our RPC services. With Rest.li we can expose the APIs directly, WITHOUT the need for this intermediate service.
This is a story of how we went from this...
...to this!
And how we came to love microservices here at LinkedIn.