We had the great pleasure of hosting a talk by Dr. Roland Kuhn: leader of Typesafe’s Akka project, and coauthor of the book Reactive Design Patterns and the Reactive Manifesto. For a standing-room-only crowd, Roland highlighted the importance of making reactive software: of considering responsiveness, maintainability, elasticity and scalability from the outset of development. He explored several architecture elements that are commonly found in reactive systems, such as the circuit breaker, various replication techniques, and flow control protocols. These patterns are language-agnostic and also independent of the abundant choice of reactive programming frameworks and libraries. Check out his slides!
9. Implementation: Message-Driven
• focus on communication between components
• model message flows and protocols
• common transports: async HTTP, *MQ, Actors
9
13. Simple Component Pattern
• SingleResponsibilityPrinciple formulated by
DeMarco in «Structured analysis and system
specification» (Yourdon, New York, 1979)
• “maximize cohesion and minimize coupling”
• “a class should have only one reason to change”
(UncleBobMartin’sformulationforOOD)
13
14. Example: the Batch Job Service
• users submit jobs
• planning and validation rules
• execution on elastic compute cluster
• users query job status and results
14
19. Let-It-Crash Pattern
• Candea & Fox: “Crash-Only Software”
(USENIX HotOS IX, 2003)
• transient and rare failures are hard to detect and fix
• write component such that full restart is always o.k.
• simplified failure model leads to more reliability
19
20. Let-It-Crash Pattern
• Erlang philosophy from day one
• popularized by Netflix Chaos Monkey
• make sure that system is resilient by arbitrarily performing
recovery restarts
• exercise failure recovery code paths for real
• failure will happen, fault-avoidance is doomed
20
23. Circuit Breaker Pattern
• well-known, inspired by electrical engineering
• first published by M. Nygard in «Release It!»
• protects both ways:
• allows client to avoid long failure timeouts
• gives service some breathing room to recover
23
24. Circuit Breaker Example
24
private object StorageFailed extends RuntimeException
private def sendToStorage(job: Job): Future[StorageStatus] = {
// make an asynchronous request to the storage subsystem
val f: Future[StorageStatus] = ???
// map storage failures to Future failures to alert the breaker
f.map {
case StorageStatus.Failed => throw StorageFailed
case other => other
}
}
private val breaker = CircuitBreaker(
system.scheduler, // used for scheduling timeouts
5, // number of failures in a row when it trips
300.millis, // timeout for each service call
30.seconds) // time before trying to close after tripping
def persist(job: Job): Future[StorageStatus] =
breaker
.withCircuitBreaker(sendToStorage(job))
.recover {
case StorageFailed => StorageStatus.Failed
case _: TimeoutException => StorageStatus.Unknown
case _: CircuitBreakerOpenException => StorageStatus.Failed
}
29. Multiple-Master Replication Patterns
• this is a tough problem with no perfect solution
• requires a trade-off to be made between
consistency and availability
• consensus-based focuses on consistency
• conflict-free focuses on availability
• conflictresolution gives up a bit of both
• each requires a different programming model and can
express different transactional behavior
29
30. Consensus-Based Replication
• strong coupling between replicas to ensure that all
are “on the same page”
• unavailable during network outages or certain
machine failures
• programming model “just like a single thread”
• Postgres, Zookeeper, etc.
30
31. Replication with Conflict Resolution
• requires conflict detection
• resolution without user intervention will have to
discard some updates
• detection/resolution unavailable during partitions
• programming model “like single thread” with caveat
• popular RDBMS in default configuration offer this
31
32. Conflict-Free Replication
• express updates such that they can be merged
• cannot express “non-local” constraints
• all expressible updates can be performed under any
conditions without losses or inconsistencies
• replicas may temporarily be out of sync
• different programming model, explicitly distributed
• Riak 2.0, Akka Distributed Data
32
35. Saga Pattern: Background
• Microservice Architecture means distribution of
knowledge, no more central database instance
• Pat Helland:
• “Life Beyond Distributed Transactions”, CIDR 2007
• “Memories, Guesses, and Apologies”, MSDN blog 2007
• What about transactions that affect multiple
microservices?
35
36. Saga Pattern
• Garcia-Molina & Salem: “SAGAS”, ACM, 1987
• Bank transfer avoiding lock of both accounts:
• T₁: transfer money from X to local working account
• T₂: transfer money from local working account to Y
• C₁: compensate failure by transferring money back to X
• Compensating transactions are executed during
Saga rollback
• concurrent Sagas can see intermediate state
36
37. Saga Pattern
• backward recovery:
T₁ T₂ T₃ C₃ C₂ C₁
• forward recovery with save-points:
T₁ (sp) T₂ (sp) T₃ (sp) T₄
• in practice Sagas need to be persistent to recover
after hardware failures, meaning backward recovery
will also use save-points
37
38. Example: Bank Transfer
38
trait Account {
def withdraw(amount: BigDecimal, id: Long): Future[Unit]
def deposit(amount: BigDecimal, id: Long): Future[Unit]
}
case class Transfer(amount: BigDecimal, x: Account, y: Account)
sealed trait Event
case class TransferStarted(amount: BigDecimal, x: Account, y: Account) extends Event
case object MoneyWithdrawn extends Event
case object MoneyDeposited extends Event
case object RolledBack extends Event
39. Example: Bank Transfer
39
class TransferSaga(id: Long) extends PersistentActor {
import context.dispatcher
override val persistenceId: String = s"transaction-$id"
override def receiveCommand: PartialFunction[Any, Unit] = {
case Transfer(amount, x, y) =>
persist(TransferStarted(amount, x, y))(withdrawMoney)
}
def withdrawMoney(t: TransferStarted): Unit = {
t.x.withdraw(t.amount, id).map(_ => MoneyWithdrawn).pipeTo(self)
context.become(awaitMoneyWithdrawn(t.amount, t.x, t.y))
}
def awaitMoneyWithdrawn(amount: BigDecimal, x: Account, y: Account): Receive = {
case m @ MoneyWithdrawn => persist(m)(_ => depositMoney(amount, x, y))
}
...
}
41. Example: Bank Transfer
41
override def receiveRecover: PartialFunction[Any, Unit] = {
var start: TransferStarted = null
var last: Event = null
{
case t: TransferStarted => { start = t; last = t }
case e: Event => last = e
case RecoveryCompleted =>
last match {
case null => // wait for initialization
case t: TransferStarted => withdrawMoney(t)
case MoneyWithdrawn => depositMoney(start.amount, start.x, start.y)
case MoneyDeposited => context.stop(self)
case RolledBack => context.stop(self)
}
}
}
42. Saga Pattern: Reactive Full Circle
• Garcia-Molina & Salem note:
• “search for natural divisions of the work being performed”
• “it is the database itself that is naturally partitioned into
relatively independent components”
• “the database and the saga should be designed so that
data passed from one sub-transaction to the next via local
storage is minimized”
• fully aligned with Simple Components and isolation
42
44. Conclusion
• reactive systems are distributed
• this requires new (old) architecture patterns
• … helped by new (old) code patterns & abstractions
• none of this is dead easy: thinking is required!
44