Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Event storage in a distributed system

363 vues

Publié le

Event storage offers many practical benefits to distributed systems providing complete state changes over time, but there are a number of challenges when building an event store mechanism. Stephen Pember explores some of the problems you may encounter and shares real-world patterns for working with event storage.

Publié dans : Technologie
  • Are You Heartbroken? Don't be upset, let Justin help you get your Ex back. ●●● http://t.cn/R50e5nn
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Soyez le premier à aimer ceci

Event storage in a distributed system

  1. 1. I’d like to start with a story A few months back I was at work, and received a twitter notification
  2. 2. @svpember A guy I know tweeted this at me:
  3. 3. @svpember Oliver Gierke, who’s the Spring Data lead at Pivotal, sent out this tweet. <read> So I, uh, rather enthusiastically responded…
  4. 4. @svpember by deluging the poor guy with tweets until he responded with ‘heyyy this has been helpful, thanks! *Upside-down smiley face*” Other people jumped on the conversation too, of course, but I think I was the most. ah. enthusiastic.
  5. 5. @svpember One of the other participants tweeted this: “maybe you should make a blog post?” Which is wonderful, right. Validation from strangers on the internet! I thought better yet, I’ll make a talk.
  6. 6. Event Storage in a Distributed System Steve Pember CTO, ThirdChannel steve@thirdchannel.com Software Architecture Conf 2018: NYC @svpember So I wrote it up, submitted it, and here we are. My name is Steve. I work for a company called ThirdChannel, out of Boston. Now, I Realized that I accidentally hit the 90 minute checkbox when submitting this talk, so, ah, the scope of this a little bit more involved than the title initially suggests. Let’s talk about Events
  7. 7. It’s all about Events This talk is all about events. You’re going to be sick of hearing the word by the end of this presentation. However, bear with me. it is important. I feel that systems that are not representing transitions within themselves as events, and are not actively listening to or taking advantage of these internal events… not being quote / unquote ‘reactive’… are missing out on huge advantages in flexibility, reporting, and scalability both in terms of deployments and operational / developmental scalability. Now, after that bold statement… well, we’ll get into all of that, but first we should discuss the most fundamental question…
  8. 8. What is an Event? What is an event? Does anyone want to take a stab at defining what we should consider an event?
  9. 9. @svpember Event • Something that has occurred in the past within some software • Intended for consumption by other software • Distribution is Often asynchronous • Often contains data detailing the event • Immutable a Well, I would classify it as… a piece of data that that signifies some action has been performed in the past within some software two most important bits are in the past and immutable
  10. 10. @svpember So, events like “Order placed”… Are all great. They denote that something has happened. Immutability is another important part. Let’s say ‘ItemsShippedEvent’ was emitted with a value of 5. It would potentially be disastrous for a something to later change that value to say, 1000, right? Would disrupt all meaning
  11. 11. –Martin Fowler “Domain Event” https://martinfowler.com/eaaDev/DomainEvent.html Things happen. Not all of them are interesting, some may be worth recording but don’t provoke a reaction. The most interesting ones cause a reaction. Many systems need to react to interesting events. Often you need to know why a system reacts in the way it did. Because it wouldn’t be an Architecture talk without a Fowler quote… Another way to think of events.. which frames most of this discussion, is a quote from Martin Fowler: don’t read! Basically: “Important events cause reactions elsewhere in the system, and it’s often important to understand why those reactions occurred”.
  12. 12. As an aside… Reacting to events may be nothing new to Javascript or frontend developers • Your browser’s DOM, Javascript, and I suppose UIs in general are full of events. Literally anything you do on the browser generates an event. Move the mouse, click a box, type a letter, let go of a letter, etc. • While the knowledge of this talk is transferable to the frontend to some extent, the majority of this talk is focused on the server side. Server side doesn’t traditionally deal with lot of events, I’d say. particularly if you started your dev life and career with big frameworks like rails, grails, django, etc
  13. 13. It’s not hard to do, to program in terms of events Generally, one or more event are created when a user successfully performs an action or Command. (slide of various event names, will reuse this aesthetic later). They represent successful deltas or actions that have occurred in the past Now, here, we have code which accepts some incoming Command object to create a new TodoList, validates it, and generates two events based on that command, saves the events, saves a projection of the current state of the TodoList entity (though this is optional, as I could recreate the state of the todo list entirely from the events), and transmits them.
  14. 14. My domain objects or ‘Entities’, start becoming highly functional as they acquire methods to manipulate them by applying events. This is certainly not production code, but you can see how my entities start acquiring handlers on themselves that when provided various events know how to use that event to update their internal state. In production, I’d probably have the entity use an internal mutable builder, and the builder receive the events, which would then spit out a validated, immutable Entity. but Alas, this is an example.
  15. 15. @svpember A slide back I mentioned that Events need to be transmitted. Well, these events need to be seen by others to be useful. It’s one thing to have my entity only see its events, but it’s entirely another thing to share.. and mix events from across the system. And so, Need to have some method of transmitting these events to other interested parties Both Internal and External For internal, this typically involves some asynchronous publish/subscribe mechanism. Tools that I’ve used successfully for this purpose have included a library called Project Reactor, and using Reactive Streams via the rxJava library
  16. 16. @svpember Externally, these events can be transmitted either via point to point http or via some asynchronous message queue… which as we’ll see later is my preferred method.
  17. 17. At this point you, might be saying.. ok, cool but why… That’s one point of criticism I often get for my talks. I mention all these great things I’m working with but neglect to really hammer home the ‘WHY’ of it all. So.. why? why should you care about any of this? so far it just looks like I’m adding a bunch of extra hassle for you.
  18. 18. The reason is that events, these smaaaaallll bits of information are collectively extremely powerful. The point of this presentation, the synopsis, the end goal, is to try and show you that tracking events, persisting them, and treating them as first-class citizens within your system is a wise idea with loads of potential benefit. AND that there are some caveats to be aware of when we talk about how to store these events and work with them within a distributed environment. However, there are some steps in the way to get there.
  19. 19. @svpember Overview • Event-Oriented Distributed System Architecture Today, we’re going to discuss some information that I’ve broken down into the following topics: <read> The architectural background of this talk. This is an architectural conference, and it’s important. This will cover some concepts and architectural designs to help prepare your systems to think in terms of events. And because we’re architects, we’ll probably have some boxes and lines drawn up on the screen, because it wouldn’t be an architectural presentation without good ol boxes and lines
  20. 20. @svpember Overview • Event-Oriented Distributed System Architecture • The Power of Events after we get in the mindset of working with events and architecting our systems to operate in an event-first fashion, we’ll look into why you should be excited about having events laying around as first-class citizens within your app. I do this topic second, as I think the architecture portion is the harder pill to swallow… plus I think that getting your head around the existence of events makes it easier to start to see their usefulness. Although I could be wrong, let’s give it a go.
  21. 21. @svpember Overview • Event-Oriented Distributed System Architecture • The Power of Events • Event Storage & Lifecycle After that, we’ll discuss the impact of storing events, different patterns for doing so, and some lifecycle concerns for services in an event based environment
  22. 22. @svpember Overview • Event-Oriented Distributed System Architecture • The Power of Events • Event Storage & Lifecycle • Day to Day Concerns / Working with Events And then finally, some additional details of working with events that don’t necessarily have to do with storing them. Alright… let’s begin
  23. 23. Let’s start with Microservices First, a few minutes on Microservices
  24. 24. I’ve borrowed this slide before. Thanks Alvaro! I like it because it’s honest. You start with a mess and if you’re not careful you end up with a distributed mess Anyway… • How many of you have attended SACon before? This is, I think, my third or fourth time over 4 years. I’m pretty sure the first two years were almost exclusively talks about microservices. I know I contributed to it, eh? And there’s a good reason This notion of Microservices has been a great transformational thing in software development and architecture. Even if you think it’s a rehash of SOA, it still has been promoting the virtues and popularity of distributed systems with the larger community… which I think is a good thing. • Now… Just to get a general poll… who’s working with them? And how here likes working with them! Any hands go down.. aw, some jaded folks
  25. 25. @svpember The Promise of Microservices • Reduced complexity per service • Easier for developers to understand a single service • Teams work with more autonomy • Independent scaling, deployments, builds • Fault isolation • “Right tool for the job” • Isolation and decoupling • Continuous Delivery and Deployment The promise of Microservices is vey alluring, yeah? Right out of the gate, we immediately reduce the complexity of our codebase by making it several smaller codebases My favorite: It allows teams to work in great autonomy, with improved isolation and decoupling. It allows for independent scaling of services. The most powerful is that micro services Allows for continuous Delivery and Continuous Deployment. Which honestly is… I think the pinnacle of efficiency a software dev team should be striving for. Now I’m being a bit flippant about that because we should be concerned about testing and regressions of our releases, of course… but all of this is another topic entirely.
  26. 26. @svpember … Some Caveats • Vastly increased infrastructure complexity • So much Ops • Teams need to handle all lifecycle steps of service deployment • Conceptual difficulty with multiple service deployments • Potential performance hits for intra-service comms As useful and as powerful as all of that is… there are absolutely some tradeoffs when using microservices. You immediately… IMMEDIATELY have increased complexity in your infrastructure. And going back to my point that micro services have been good for growing awareness within the community… the rise of tooling in this space has just been insane. Kubernetes, Hashicorp’s entire business model… it’s great stuff The point here is that if your team isn’t ready to shift the complexity of the codebase into infrastructural and ops complexity, you should probably hold off.
  27. 27. A few things bothered me… That being said, I still think that the Microservice approach is very useful. However, as it’s been growing, three points have always bothered me that I never felt were fully discussed or I discussed agreed in the presentations I’ve seen and material I’ve read. Just as a preview… did everyone see the keynotes yesterday morning? Cornelia Davis started out her presentation listing issues with distributed systems and it spoke to my soul man. She basically gave this talk already.
  28. 28. @svpember Questions about Microservices • How should they communicate? First: how should these services communicate?
  29. 29. @svpember You see, when the term ‘microservice’ became the rage, it was my observation that people were building services which utilized point to point synchronous http comms to query, post, etc data between services. There’d be service discovery systems in play necessary to make other services aware of each others existence. These synchronous calls utilize resources (e.g. threads), block, take time… and if a service goes down, what happens if another service is reliant upon it?
  30. 30. @svpember slide of multiple services being needed to support a single hop And I’m aware of netflix’s hystrix and other circuit breaker technologies to help with all of this, but still seems a lot could go wrong in that chain.
  31. 31. Time to Go Reactive To address point 1, I suggest embracing a design pattern known as ‘Reactive’
  32. 32. @svpember Reactive Systems • Communication between services driven by asynchronous events Has anyone here heard of the ‘Reactive Manifesto’? I’m big fan of it, but going to bit of a spin on its tenants to fit my narrative here. Anyway, when say ‘Reactive’, we don’t mean ‘React.js’ or Reactive Streams (Though I love both of those things) It’s a design philosophy to apply to systems to help them achieve high scalability. The first rule for Reactive Systems is that… On a positive note, Based on what I’ve been seeing over the past year or so, the collective opinion is moving away from entirely http to be more event driven, which is great. There are, for example, several talks at this conference on this very subject.
  33. 33. @svpember Reactive Systems • Communication between services done by asynchronous events • Services ‘React’ to observed events Anyway, point two!
  34. 34. @svpember Reactive Systems • Communication between services done by asynchronous events • Services ‘React’ to observed events • Use some Message Broker technology to promote Async and reduce Data loss
  35. 35. @svpember Reactive Systems • Communication between services done by asynchronous events • Services ‘React’ to observed events • Use some Message Broker technology to promote Async and reduce Data loss • Synchronous HTTP calls between services kept to a minimum by reducing the # of synchronous calls, we gain two main benefits: - less resource contention on the thread pools of each of our services - Firewall like effect: if services die, the they don’t cause other systems to fail or have to rely on fallback circuit breaker code - can reduce the number of calls, by collecting data from other services One side effect of this and the previous points is that your platform should become quite fast, as well as have… <next>
  36. 36. @svpember Reactive Systems • Communication between services done by asynchronous events • Services ‘React’ to observed events • Use some Message Broker technology to promote Async and reduce Data loss • Synchronous HTTP calls between services kept to a minimum • Resiliency against failing services - By reducing and eliminating the need to runtime dependency on other services, each service can function in isolation and will not be brought down by failing services. And that’s my basic overview of Reactive systems. Everyone with me so far? No. too bad, let’s keep going.
  37. 37. @svpember Questions about Microservices • How should they communicate? • How “large” should a micro service be? • How much responsibility should a single service have? Now, the next two… <read> I feel can be solved or addressed by an architectural design pattern known as…<next>
  38. 38. Domain Driven Design Command Query Responsibility Segregation Domain Driven Design (or DDD) and a related variant called Command Query-Responsibility Segregation (aka CQRS) Are two architectural patterns intended for very complex systems. These are not trivial things to understand and you should take care if you plan to adopt them. They have great power, though
  39. 39. - Eric Evans “Some objects are not defined primarily by their attributes. They represent a thread of identity that runs through time and often across distinct representations. Sometimes such an object must be matched with another object even though attributes differ. An object must be distinguished from other objects even though they might have the same attributes.” One of the most interesting parts of DDD, one that really stuck with me, is this quote: <read quote> - that’s interesting, yeah?                If I change my name, am I no longer me? Of course not, I’m defined by more than my name. If I change my email address, or my address, or my social security number for some reason, am I no longer me? Obviously not… Can your database understand identity changes like this and still be able to find the original object?
  40. 40. @svpember Domain Driven Design • Ubiquitous Language DDD has several interesting ideas besides that quote of course. When building a system adhering to DDD, it offers several guidelines. The first: Ubiquitous Language. Everyone in the company should be speaking in the same terms. The same concepts. Your classes and objects should reflect the Language. Marketing should be using the same terms as Sales and as Engineering. When Engineering builds a new Feature or new Service the entire company should be using it. If you’re an e-commerce app and product management decides to name the Product Catalog… ah… ‘Zephyr’ for some reason, Engineering is also calling it Zephyr, and there better be a Zephyr.java file somewhere in the repo.
  41. 41. @svpember Domain Driven Design • Ubiquitous Language • Entities / Value Objects
  42. 42. Entity Objects are those that you truly care about tracking. These objects will have specific identifiers and you will take great care in maintaining their individuality and their relationship to other objects.
  43. 43. Value Objects: when you know you have a lot of something, but you don’t care about the identity of each individual item. The example that I believe is given in the DDD book is that a automobile Engine might be an Entity (it has a unique serial number that mechanics might car to track), while the wheels… well, the car has 4, and I don’t necessarily assign any uniqueness to them
  44. 44. @svpember Domain Driven Design • Ubiquitous Language • Entities / Value Objects • Aggregates Third term: Aggregates
  45. 45. Group of Entities With One Root … the aggregate Root. the root acts the parent or entry point when referencing the aggregate, which leads to the next point:
  46. 46. @svpember Domain Driven Design • Ubiquitous Language • Entities / Value Objects • Aggregates • Bounded Contexts powerful concept logical grouping of related functionality
  47. 47. @svpember The key point here is that Objects inside an Aggregate may hold references to other Roots. But only the root id. They may not hold any information about entities below the root within that context.
  48. 48. Combine and Isolate related objects into Modules It’s a natural step then in your code to ensure that you combine and isolate all related objects within a context into one module.
  49. 49. No Direct comms across boundaries And, no direct communications are allowed across context or module boundaries. Well if that’s true, what do I do if I need information from across boundaries?
  50. 50. @svpember Domain Driven Design • Ubiquitous Language • Entities / Value Objects • Aggregates • Bounded Contexts • Domain Events
  51. 51. @svpember Here we have several of our Modules - whose names mirror our Ubiquitous Language, btw. They cannot talk to each other directly, but rather through some intermediary mechanism. Now, it could be direct message passing, but this differs from importing and calling methods directly in the module. You could use a Pub Sub Mechanism, or some sort of message broker… etc, etc. The important bit is that the modules are bounded away from each other.
  52. 52. Events Are Transactionally Safe That is, no Events are emitted until an item is successfully saved to disk. The event is part of the transaction.
  53. 53. @svpember Domain Driven Design • Ubiquitous Language • Entities / Value Objects • Aggregates • Bounded Contexts • Domain Events • One Last Takeaway…
  54. 54. Wait… what if I need to share across contexts? This may not be phrased correctly , but the answer to this is one of the most powerful aspects of DDD and one of the hardest to get used to. Bounded contexts are isolated, autonomous components with their own entities, classes, service objects, etc…. However, all the bounded contexts exist within the same system, and certain concepts or Entities will likely exist throughout the entirety of a system… although each context may only care about a subset of the info about that entity. Or, another way to phrase it: each bounded context is only concerned with some subset of an Entity within a system, and no context will know the entire set of information about an entity. This separation is the concern of the context’s boundary
  55. 55. @svpember - For example! The catalog context knows how much inventory is left for this particular SKU, but the shopping cart and the admin context don’t necessarily need the information. The inventory count may be a function of a ‘Warehousing context’ that the catalog receives events for. Similarly, the Shopping Cart context contains the quantity, the # of this SKU that the user wants to purchase. That information has no bearing on the catalog context. This concept - that an entity can exist in multiple contexts, though each context is only concerned with a subset of that information - is very powerful, and very useful. Understanding which information belongs in which context… and maintaining that decoupling, is however, one of the toughest aspects of DDD, and can be a challenge for younger developers or those newer to ddd to grasp. For example, we recently had a situation…
  56. 56. And now, CQRS Command Query Responsibility Segregation is an evolution off of DDD, that calls for changes in how one accepts and sends data
  57. 57. MVC With your standard MVC/ CRUD style approach that you get out of the box with many big frameworks, the pattern generally is something like the following: - user makes a change via the ui, let’s say to a Product object - There’ll likely be a Product Controller which takes that data - passes it to a Product Model - which in turn gets validated and saved to the database, likely in a table named ‘product’ When the user wants to retrieve information about the product, the same objects are used. Query for product with id X, product controller uses a product model to retrieve data from the product row, then passes the retrieved model up to the ui
  58. 58. CQRS CQRS says: why use the same objects for every task? It makes a hard distinction between modifying actions the user is attempting to do - which it calls ‘Commands’ - and data retrieval tasks - aka Queries - , and thus breaks up the underlying code to enforce that distinction. So, If a user wants to make a change to some data, say again, a Product, he manipulates the change in the ui, maybe clicks a button, and a relevant controller converts that request into a ProductChangeCommand object or model, which contains details on what the user is trying to change. That command is then validated, then the changes are persisted AND, while not shown here, domain events are emitted. As for Querying, a User dictates the query, it’s packaged by the query controller into a query object, results are pulled from the database, and the response is returned to the user. While it reads similar it’s important to understand that the models are different, and often advantageously so. The Product information I display to the end user may be only a subset of the Product model / entity object, so I only pull a few fields from the db. Or, my query object represents a composite of several models in one multi faceted report that is pulled in one query.
  59. 59. CQRS Following that line of thinking, we can extends this a bit further. We can isolate our writes and our reads into separate contexts or services. Besides the nice decoupling, it allows us to get creative in other areas: - want to scale out our write capability vs our reads? no problem. just scale up one of those services - if our write service emits domain events when it saves, could our query service listen for other domains’ events’? sure! - want to have custom query reports that pull from multiple domains? no problem! You can build multiple query models that are highly targeted towards what ever end user report or experience you’re trying to deliver.
  60. 60. Allows for interesting Architectures Continuing that line of thinking, it allows for some very interesting Architectures
  61. 61. This a graphic taken from Udi Dahan’s website, he’s another pioneer in the CQRS space. What this diagram is trying to depict similar to what I’ve been describing. The blocks labelled ‘AC’ stand for autonomous component, I think. Think of them like a service. So, in the bottom left, the user enters a command to the first service. it succeeds and the changes are written to local storage. Events are published and retrieved by one or many other services, who update their local query caches based on those events. Then, when the user performs a query, it hits that highly targeted query cache, giving the user the intended results with a minimum of sql queries or joins.
  62. 62. Ok then, still seems like a lot of trouble… so why.. why go through all this trouble? 
  63. 63. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • writes are contained for a particular service / bounded context • scale up services receiving writes • create query caches specifically designed to handle queries • can scale those up too • Efficient Querying -> just going to highlight this again • One note: due to the distributed - and as we’ll see soon event based storage, if your company has analysts they will likely hate you. You’ll likely need to build a service or query store solely for them to run sql queries against.
  64. 64. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • Service Design via Bounded Contexts
  65. 65. Bounded Contexts are an excellent tool to determine Microservice responsibility and potentially sizing of a service. i.e. how big should it be? when do we create a new one? We went through a team exercise where we tried to figure this out… - Each circle represents a context boundary - big outer circle is 3c itself - four big inner circles are different functional areas of our company - small inner inner circles represent contexts further still What we found was that our services for the most part mapped to the smaller circles, which was great. although there was much duplication (e.g. services belonging to multiple contexts) and some we identified that should be combined. those are both bad.
  66. 66. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • Service Design via Bounded Contexts • Failure Protection
  67. 67. @svpember • For a sufficiently large system, something is always going to be in a failure state • reducing or eliminating calls between services when handling a Command or a Query eliminates the dependency on that or those additional services • Service being queried will still function • One of the tenants of the ‘Reactive Manifesto’, provides a stopgap for failures impacting the user. • For example, if the Shopping Cart Management service is down, my product catalog service should not be affected and the user should still be able to browse the catalog • Additionally, using a durable Message Broker for communication grants additional layers of protection. We use RabbitMq, but loads of folks have great success with tools like Kafka. These tools will hold on to messages, allowing consuming services to consume them at their leisure. This has advantages in situations were a service is down for a period of time. Or, imagine if a service cannot handle a message the devs need to fix it, the message waits on the queue until the service is back online resulting in no data loss. image - broken down service, happy product catalog
  68. 68. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • Service Design via Bounded Contexts • Failure Protection • Promotes Async comms • eliminating or reducing calls between services when handling Commands and Queries also eliminates blocking, synchronous calls to these services • reduces resource contention on thread pools • Also eliminates a potential failure vector: services ‘backing up’ with chains of communications
  69. 69. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • Service Design via Bounded Contexts • Failure Protection • Promotes Async comms • “Simple” Testing • I put ‘simple’ in quotes there because no matter what I say next… this is still a distributed system we’re talking about - Each service can be heavily unit and integration tested in isolation of each other • Testing of the platform as a whole is important and useful. At least for us, a good chunk if not most of our bugs is contract violations in the incoming commands and events (e.g. I thought event A was emitting two fields but it’s actually three and I didn’t have JSON ignore properties set in Jackson, or a service misspelled a variable name, or someone changed event A’s fields.) • To me, this means that our organization is lacking communication and is violating the CAP theorem, or it’s being proven true, or however you want to say it. The point here is to make sure that other teams are aware of the shape of events emitted from your services.
  70. 70. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • Service Design via Bounded Contexts • Failure Protection • Promotes Async comms • “Simple” Testing • Reduces cross-service Querying One of the most important reason you do something like this? To avoid having to write queries that would otherwise span multiple services. Has anyone had to write queries that span multiple services? It’s a nightmare. so inefficient. So, while this approach may feel like you’re duplicating data or whatnot, it’s quite efficient
  71. 71. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • Service Design via Bounded Contexts • Failure Protection • Promotes Async comms • “Simple” Testing • Reduces cross-service Querying • Also… And it also leads to one of my absolute favorite concepts…
  72. 72. EVENT SOURCING!
  73. 73. @svpember Event Sourcing • Alternate Storage Pattern -> Prevents Data Loss
  74. 74. @svpember More specifically, it’s an alternative to your standard ORM storage mapping, Where an object in memory maps directly to a row in a database, even if that row may be split via joins * update is made to a model, updates a column in your database * in this method, the only thing you know about your data is what it looks like right now.
  75. 75. @svpember Event Sourcing • Alternate Storage Pattern -> Prevents Data Loss • Store Deltas
  76. 76. Store Deltas, not Current State
  77. 77. @svpember This stream of events are persisted in our database in the order they occurred, as a journal of what has transpired. These events can then be played back against our Domain Object, building it up to the state it would be at any given point in time, although this is most likely the Current State
  78. 78. @svpember Event Sourcing • Alternate Storage Pattern -> Prevents Data Loss • Store Deltas • Additive Only • Time Series
  79. 79. Never Delete, Never Update It means you only ever insert data into your database. No event rows are ever, ever, ever updated or deleted. In so doing you’ve now turned your events table into an append only journal, which is every efficient for most databases to do. In other words, events are immutable!
  80. 80. @svpember Event Sourcing • Alternate Storage Pattern -> Prevents Data Loss • Store Deltas • Additive Only • Time Series • It’s Old Rather, this basic idea of storing deltas, rather than just current state has been around for a long time
  81. 81. - Every transaction you make with your bank. Every Credit or Debit made is logged, along with an audit trail of who (e.g. which teller) made the change. - To get your balance, your bank simply adds up each of these transactions - May also periodically record what the balance was at specific points in time, to prevent having to recalculate everything from the beginning of time. - Can you imagine if you checked your bank statement and it could only show you your current balance… not how you reached that number?
  82. 82. Lawyers! If a contract needs to be adjusted, is the contract thrown out and re-written? No. Rather, ‘addendums’ are placed on the contract. To figure out the contract actually involves, one has to read the initial contract, and then each successful addendum in order to figure out what the thing actually says.
  83. 83. @svpember Event Sourcing • Alternate Storage Pattern -> Prevents Data Loss • Store Deltas • Additive Only • Time Series • It’s Old • Easy To Implement
  84. 84. promotes a highly functional style, that’s very easy to unit test if I need to get the current state of an entity, it’s as simple as select * from event where id = x, then pass each event into an entity class, which will build it up to current state by using the functions outlined here
  85. 85. @svpember Event Sourcing • Alternate Storage Pattern -> Prevents Data Loss • Store Deltas • Additive Only • Time Series • It’s Old • Easy To Implement • … Difficult to Grasp
  86. 86. Now, this is where ES may start to hurt your brain
  87. 87. All Entities are Transient Derivatives of the Event Stream
  88. 88. Objects are backed - ‘sourced’ - by events Which is just a fancy way of saying: All Objects are ‘backed’ or ‘sourced’ by various events from the Journal or Event Stream
  89. 89. @svpember Now this has lots of powerful uses which we’ll get into in a bit, but regardless…
  90. 90. Can be difficult for Junior Engineers I’ve found that this entire concept can be a bit difficult for junior developers to grasp at first. So be aware of that. Internal Education can help dramatically.
  91. 91. But, why? I’m sure that I’m making this all sound very attractive. You’re again probably asking yourself… ok, great… but why? Ok, so why?
  92. 92. @svpember Event Sourcing: Why • Append-Only
  93. 93. @svpember Event Sourcing: Why • Append-Only • Prevents Data Loss
  94. 94. Never Delete! With Event Sourcing, no events are EVER deleted or updated, a nice side effect of the Append only nature
  95. 95. @svpember Event Sourcing: Why • Append-Only • Prevents Data Loss • Time Travel
  96. 96. @svpember Event Sourcing: Why • Append-Only • Prevents Data Loss • Time Travel • Perfect For Time Series
  97. 97. @svpember Event Sourcing: Why • Append-Only • Prevents Data Loss • Time Travel • Perfect For Time Series • Automatic Audit Log ++ Built in, Automatic Audit Log for your Entities
  98. 98. Audit Logs tell the History Events tell the Intent of History
  99. 99. Furthermore, Having the Events as a first-order member of your platform can give you enhanced information around what your users or systems are doing beyond what might normally get written to the database. Can make events that don’t necessarily deal with the properties changed by a user, but additional actions that may have occurred And it’s easier to work with and analyze the data if the events are integrated within your platform already.
  100. 100. @svpember One trivial example is this. One of first ES systems was an Internal User Management system, where our Program Managers (don’t worry about these terms) track prospective ThirdChannel employees, which we call agents. Our Managers wanted a way to get history for each potential agent, and because of Event Sourcing, it was about 5 minutes of work to display the history of each agent like that.
  101. 101. @svpember Event Sourcing: Why • Append-Only • Prevents Data Loss • Time Travel • Perfect For Time Series • Automatic Audit Log ++ • Data Mining and Reporting
  102. 102. @svpember And so you may be thinking, ok great: I can get all the relevant events for Bob’s shopping cart, but I’m just only ever going to play them all to get his current state! And for much of the time, yes, users definitely want to know what their current state is. The magic comes though with the business value. Business loves time series data
  103. 103. @svpember Reporting in Event Temporality • Look at ALL users Product Removed events: which products are being discarded? • Find ALL ProductAdded + ProductRemoved event pairings (i.e. the same product, user) that occur with 5 minutes: perhaps a user is hesitant on purchasing… maybe offer them a discount? • Find avg time between ProductAdded and OrderPlaced: how long are users sitting with non-empty carts? • Find Trends in all of the above Because you have access to the all data changes that have happened along with the time relationship… you can get extremely creative.. Now, I’m very creative, but here’s a few ideas for reports on just users’ shopping cart events…
  104. 104. Business Types love Reports I assure you, if you start showing off or even hinting to your product research teams, your product owner teams, etc that these capabilities could exist within your platform… they will get very excited
  105. 105. Collecting and applying your events like this is known as a ‘projection’. You’re taking events from your various event streams and projecting them into or onto some data structure. You can also think of it as a Materialized View. maybe just an image and talk about updating projections
  106. 106. Is there a performance hit? Save ALL the events? So, that all being said.. Two questions I generally receive when talking about this…<read> The answer to the first is yes, there is a little performance hit if you play events each time during a user request. However, we’ve found that it’s fairly minimal for entities with fewer than a hundred or so events on them. It’s one SQL query to get all events for an entity to see current state, just as it would be with an ORM. However, we typically tend to follow the CQRS methodology and build current state projections for run time querying that users see. Now, for the answer to the second, let’s jump to the next section….
  107. 107. Questions? Questions before I proceed?
  108. 108. Event Storage Ok, With that all in mind, let’s proceed to the next section, Event Storage… let’s talk about how we store all these events. First, I want to address the two questions I left on.
  109. 109. First, in an event oriented world, you’re going to acquire quite a few events. Going to seem pretty messy after a while. You’ll end up with perhaps larger database sizes than you originally thought, though, remember this is largely due to the fact that you’re now tracking a third dimension of time within your data store And really… data storage is CHEAP. most of us are not netflix or facebook and the scale of events we’ll be working with is very manageable.
  110. 110. Now, of course, if this is at all bothersome, you can adopt a compaction strategy. The most well known is SnapShotting, where you compress related events older than say, two years into one object, then extract the raw events and put them into a cheaper long term storage like AWS glacier. Still never throw them away, though.
  111. 111. @svpember You can also use Snapshotting more frequently as a mechanism to alleviate performance troubles, too. Make a snapshot on some interval, say every week.. or every 100 events. You load the snapshot first, then find all events since the snapshot was taken. The issue here is that this adds an additional query to fetch the snapshot, so it’s only worthwhile if the number of events you have to process takes longer than additional database query. If your events are pure functions, it will take a fairly high number to be worse than the db query.
  112. 112. Event Schema Anyway, let’s discuss what an event looks like on disc. And by that, I mean… physically, what does our database schema look like? With event sourcing, there’s no real ‘correct’ way of doing things. You can use any type of data storage to store your events, although we at third thirdchannel prefer postgres and have had some experience with cassandra Anyway. I think there’s generally two approaches to what an event looks like on disc. It’s either:
  113. 113. @svpember Table Per Event: specific schema for the properties of an event
  114. 114. list fields An event, at minimum must have some identifier (we use uuids), a link to some entity id that this event belongs to, the revision number of the event - this helps keep events ordered and helps guards against things like optimistic locking -, the user id of the user old price, new price, currency are all specific to this event
  115. 115. @svpember Or, One event Table. In this scenario, as you may imagine, ALL your events are in one table. It’s easy to say, shard this table by date or something, but basically, one event table. Here, our events are essentially Schema-less. Or rather, there is an implicit schema. As your events are parsed or deserialized from disc, the application
  116. 116. example db schema does anyone here not know about the jsonb datatype? Seriously, write this down. go look at this thing. switch to using Postgres entirely for it. It’s a better document store than mongoDB. I don’t like MongoDB very much, although admittedly I haven’t given it a fair shake these past few years. True story: Monday a friend of mine
  117. 117. @svpember Recommend: One Event Table • No Migrations, past the first (+) • Trivial to Add New Events (+) • Selecting Multiple Event Types for a single Entity in one Query, no joins (++) • ProTip: Use Postgres and the jsonb data type (+) • Querying across multiple event types with no joins (+) • Zero to Minimal Database Constraints (-) My recommendation is One event table… + and -
  118. 118. Data Locality & Service Lifecycle This title may be strange but bear with me. So, now that we’ve chosen the schema for events… the question still stands: how does this change in light of a distributed environment? If I’ve been arguing that we can have multiple data stores within our system, and each of services are generating events… well, where should these events physically live?
  119. 119. @svpember It’s a bit of a spectrum given the assumption that each service has a data store, I think there’s two basic ‘pure’ strategies for event storage.
  120. 120. Service - Local Storage On one end, each service is responsible for storing a certain set of domain events. Basically, anything that a service emits it should also store. This of course requires that each service have its own datastore and generally will operate in a way you may be accustomed to. In addition to the events, you’ll likely have models or materialized views representing the current state that are updated by the events
  121. 121. Central Store • On the other end is the central event store. It has some mechanism to listen to all events that are emitted within your system and then save them to one general data storage. Conceptually it’s one service that writes to that store (single writer), but can handle read requests from other services (single writer, multiple reader).  ◦ What’s interesting architectural approach with this pattern is that it opens up your services to not need a local data store… they would be entirely event driven and hold their entire state in local cache. It may sounds crazy, but is entirely feasible in this structure Anyway. which approach to pick?… I don’t think there’s a right answer, it’s really what’s comfortable for you and your team. What can help, though, is looking at different lifecycle moments your services can go through, to better illustrate each approach
  122. 122. @svpember Event Storage Workflow Scenarios • How Does a Service access events? First, the most basic. What happens when a service wants to access events?
  123. 123. @svpember Distributed (Local) 21 How do we query? the answer should be fairly obvious:
  124. 124. @svpember Distributed (Remote) 1 2 3 4 5 6
  125. 125. @svpember Central Store <walk through> In this scenario, I would advocate skipping routing the requests through the message queue. The central should be fairly prominent. Direct tcp/http queries should be just fine. With the distributed scenario, services may come and go and you typically will broadcast the query to receive events, without knowing exactly who/what contains those events. With the central store, you do know, so you can typically skip around the Message Queue if you’d like
  126. 126. @svpember Event Storage Workflow Scenarios • How Does a Service access events? • What happens when we bring up a new service? What happens when we bring up a new service? In this scenario: - new service - empty - needs various events from different domains in order to bring itself into alignment with the current state of the other services
  127. 127. @svpember Distributed
  128. 128. @svpember Distributed service appears. I need a,b,c!. responses
  129. 129. @svpember Central Store service appears. queries central store
  130. 130. @svpember Event Storage Workflow Scenarios • How Does a Service access events? • What happens when we bring up a new service? • What happens when a service misses or fails to process an event? what happens when a service misses -> I’d argue that it’s difficult to ‘miss’ an event if you’re using a message broker it should hold on to messages until services can read them But, perhaps there’s some catastrophic event and you lose messages on the queue.  What’s more likely is that your service won’t know how to handle what we’re really talking about here is the ability to reprocess events
  131. 131. @svpember Distributed It’s basically the same as when a service comes online, although you’ll need a smaller set of data
  132. 132. @svpember central
  133. 133. @svpember Event Storage Workflow Scenarios • How Does a Service access events? • What happens when we bring up a new service? • What happens when a service misses or fails to process an event? • What about out-of-order events?
  134. 134. @svpember Event Storage Workflow Scenarios • How Does a Service access events? • What happens when we bring up a new service? • What happens when a service misses or fails to process an event? • What about out-of-order events? • What is the process for decommissioning a service?
  135. 135. @svpember Decomission - Distributed • Are we bringing up a new service? • Are we simply killing functionality? • Don’t get rid of the events! With Distributed, you need a plan for what to do with the events you have in the system you’re shutting down. - If you’re replacing an old service with new, refactored version of it, you should be fine, so long as the new version knows how to respond to the same requests for data and to handle the same commands and events - If you’re killing the service entirely, that likely means the functionality is also going away. - Still, something needs to be responsible for the events to support old requests. - Consider offloading the events and any query mechanisms to the most closely related service
  136. 136. @svpember Decommission - Centralized • … just delete the service With Centralized, the process is much easier. If you know the service isn’t need anymore… well, it’s gone. The events it was responsible for creating are still in the central store should you need them.
  137. 137. Recap
  138. 138. @svpember Per-Service Storage • Less infrastructure • Proper containment of events • Requires multiple event consumers and ‘rebroadcast’ mechanisms
  139. 139. @svpember Decentralized Storage • Much larger footprint • Convenient • Violates ‘self-containedness’ and distribution of events • Easier for mining purposes • One rebroadcast mechanism
  140. 140. Managing Events
  141. 141. Event Versioning
  142. 142. Event Storming
  143. 143. @svpember Event Storming Building Blocks • Events • Reactions - “Whenever an account is created, I need to send an email” • System Commands • User-Initiated Commands • External Systems • Policy - Flow of Events and Reactions
  144. 144. Event Storming- Policy
  145. 145. Conclusion: Yay, Events. I realized that this talk ends rather abruptly. So here’s a slide to pace things out. Events are great.
  146. 146. Thank You! @svpember spember@gmail.com @svpember
  147. 147. @svpember Images • Ant with stick: https://www.reddit.com/r/photoshopbattles/comments/1uh80p/ perfectly_timed_photo_of_an_ant_lifting_a_stick/ • Why - man with bowtie: https://silktide.com/dear-ico-this-is-why-web-developers-hate- you/ • Why - Ryan Reynolds: http://www.reactiongifs.com/r/but-why.gif • Why - Jon Stewart: https://www.huffingtonpost.com/2013/10/25/jon-stewart- apologizes-for-us_n_4162980.html • EventStorming: https://blog.redelastic.com/corporate-arts-crafts-modelling-reactive- systems-with-event-storming-73c6236f5dd7 • Fireworks: https://commons.wikimedia.org/wiki/ File:Canada%27s_fireworks_at_the_2013_Celebration_of_Light_in_Vancouver,_BC.jpg • Mad Developers: https://www.hbo.com/last-week-tonight-with-john-oliver • Workers removing delete key: https://gcn.com/articles/2015/03/31/deleted-emails.aspx
  148. 148. @svpember Further Reading • Event Versioning: https://leanpub.com/esversioning/read • Event Storming: https://blog.redelastic.com/corporate-arts-crafts- modelling-reactive-systems-with-event-storming-73c6236f5dd7

×