Adopting actors was a journey and some of our early assumptions would create issues later on. Slowly we realised we had not fully embraced the let-it-crash philosophy, did not understand how the atomic nature of actors effects data sovereignty or how parts of our application would have to be eventually consistent.
In this talk I will share some of our mistakes, how we arrived at those mistakes and how we eventually resolved them. I explore the reasoning behind our mistakes and the lessons we discovered, so hopefully you can recognise and avoid them.
https://www.youtube.com/watch?v=4R1u7EEDn8Y
35. Lessons
- Atomicity and Consistency
- Actor modeling ≠ Object modeling
- Test for Resilience not robustness
- Refactor Early
Notes de l'éditeur
Principle Engineer – Workday’s Grid Cloud Master team. – Who is workday
Finance and Human Capital Management – ERP Vendor – 100% in the cloud – all customers on a single version
Fiscal 2016 Total Revenue of $1.16 billion, up 48% year over year
Over 5000 employees, over 500 employees in Dublin
2016: Best Workplaces in Ireland, Great Place to Work Institute (#2 for large companies)
2016: 10 Best Large Workplaces in Tech, Fortune (#2)
provide elastic grid – other services
Reliable execution of background tasks or Jobs – pdf printing to payrole
Cloudmaster - Agents - Schedule and assign to Agents
5 pools of agents
Different types of task, memory size, execution speed
This talk is about the lessons I learned migrating a multithreaded java server application to Akka.
To support this growth we need to move to stateful services -- Why
Actor model of concurrency:
Safer (no deadlocks)
Easier to reason about
Easier to test
Better distribution
Easier scalability
Then Scala because of akka – key selling point
Trying to avoid two way relationship (coupling – mutability)
Static State should be immutable
Trying to avoid two way relationship (coupling – mutability)
Static State should be immutable
Trying to avoid two way relationship (coupling – mutability)
Static State should be immutable
Everyone knows about the God class – threading and mutexes make this worse
Some are big - Marlon Brando – some are small Robert Downey Junior - me
Even when small - entourage
AgentPoolActor - Responsible for – Agent actors – Queue of tasks – and their assignments
Decomposed into separate classes and traits - Still one actor with an entourage
Also drives more bad decisions
AgentPoolActor and AgentStateActor
External DB changes – sending notifications – message loss – recovery
Caused by movie star – Thought problem was stream of events were inconsistent – fix that
State Inconsistent – failure – production outage
… Beauty of split brains
AgentPoolActor takes job from the Queue
Assigns it to an Agent
Agent might fail and put it back
Pool or Agent might own the job - Cannot reliably find the job
EG Cancel Job
Who - When
PoolActor has decided to assign task to an agent
Async message to StateActor – PoolActor must ensure agent not reused – before reply
What if reply timesout???
Crash - Can I guarantee consistency – what happens to the job?
Chaos Marmoset base actor overrides the unhandled method
Messages can cause failures or delays
Horizontal scalability by pushing all state into the database
Actors are about data – Actors are Stateful – Impedance
Stateless services cannot update the same data as actor
Autonomy – single responsibility
If your actors write to the database
We want agent assignments to be consistent
Banking Transactions ACID? No - Suspense Account – Reconciliation – Compensating transactions
Must handle failure cases