2. ● Joshua Harlow
○ Yahoo! dev. for ~7 years
○ OpenStack dev. for ~3.5 years
○ Master Trouble-maker
○ Oslo, kazoo, anvil, taskflow, cloudinit… more …
● Min Pae
○ HP dev. for ~7 months
○ OpenStack dev. for ~7 months
○ Lead spell checker
○ Cue, taskflow, automaton… period ...
Who are we
3. - Distributed systems are complex
- Scale out, resumption, resilency, HA, visibility
into active work … are not easily solveable
problems (some learn this the hard way)
- Understanding your states and workflows (and
managing, transitioning and running) is key to
solving many of these complex problems
The problem
4. - Declarative workflows
- Persisted execution state (checkpoints)
- Automatic migration of workflows/jobs
- Horizontal scalability
- Magic!
Taskflow does ...
5. - Atom (task and retry execution units)
- Flow (composition unit)
- Engine (work execution <-> persistence)
- Job / Jobboard (work discovery/ownership unit)
- Conductor (‘conducts’ automated
discovery/ownership, flow construction and
execution)
Taskflow is ...
6. - Execution unit
- Has
- dependencies (“requires”)
- data (“provides”)
- Defines
- execute(...) - business logic
- revert(...) - exception handler
Taskflow - Atom:Task
class TakeABottleDown(task.Task):
def execute(self, bottles_left):
sys.stdout.write('Take one down, ')
sys.stdout.flush()
time.sleep(TAKE_DOWN_DELAY)
return bottles_left - 1
def revert(self, **kwargs):
…
class PassItAround(task.Task):
…
class Conclusion(task.Task):
...
7. - Controls retry semantics of
associated flow (and subflows and
…)
- Has
- dependencies (“requires”)
- data (“provides”)
- Defines
- execute(...) - business logic
- revert(...) - exception handler
- on_failure(...) - decision maker
that affects retry semantics
Taskflow - Atom:Retry
class Retry1(retry.Retry):
def execute(self, param1):
print param1
return param1 + ‘ printed’
def revert(self, **kwargs)
print “reverting...”
def on_failure(self, **kwargs):
if self.attempts < 5:
return retry.RETRY
else:
return retry.REVERT_ALL
8. - Composition of Tasks
- Defines transitions between Tasks
- Allows implicit and explicit
dependencies
- Required methods(?)
- add(...) - add (and link) task(s),
flow(s)
- iter_links(...) - iterator over the
created links (links are created
during add)
Taskflow - Flow
s = linear_flow.Flow(‘bottle-song’)
take_bottle = TakeABottleDown(...)
pass_it = PassItAround(...)
next_bottles = Conclusion(...)
s.add(take_bottle, pass_it, next_bottles)
9. - Run flows (and associated tasks) to completion
- Decompose flows into a DAG
- Edge dependencies mandated by flow(s)
patterns are always retained
- Prepare persistence layer
- Run tasks/retries as they are ready
- Optionally in parallel (and/or remotely)...
- Save and fetch results from persistence layer
and run next tasks/retries (and repeat)
- State machine based:
- http://docs.openstack.org/developer/taskflow/st
ates.html#engine
Taskflow - Engines(s)
10. - Place where work can be placed by producer
entities and consumed/owned (and worked on) by
other consumer entities
- Similar to a job queue but builds in liveness
semantics/capabilities (and semantics expect single
ownership via a claim concept)
- If a owner of a unit of work dies, the claim on the
work they are performing is automatically lost and
freed up for others
- Typically tied to a unit of work (being a flow) and its
optional persistence location (so that prior work can
be resumed)
Taskflow - Job(s) & Jobboard(s)
11. ● Essentially an advanced/specialized job processor
- Connects to a jobboard
- Periodically fetches contents of jobboard
- Attempts to claim a job
- Constructs jobs work (flow, other...)
- Performs jobs work (using engines of various
types and persistence backends to enable
reliablility)
- Removes job (on completion)
- (rinse and repeat)
● Expected to be scaled out (run as many conductors
as needed/desired)
Taskflow - Conductor
13. - Jobs and Jobboards provide work ownership and
work discovery
- Horizontal scaling via conductors
- Automatic migration of work between conductors
- Persistence of execution state enable resumption
and automated ownership transfer
- When a conductor fails, job(s) in progress is
picked up (and resumed to last checkpoint) by the
next worker that frees up, no need to wait for the
worker to come back.
- Turn your software off safely and handle failures
gracefully!
Wherefore Taskflow?
14. - Declarative definition of work
- Decouples what (Task, Flow) from how (Engine)
- Coroutines are not separable from the
surrounding code, and can not be automatically
parallelized
- Separation of declaration and execution allows
flexibility in execution strategy
- Engine tracks execution state and transitions
- Parallel (green)threaded execution…
- Remote worker execution…
Wherefore Taskflow? (cont.)
15. - Not strongly tied into python as a language (for
better or worse); concepts are easily transferable to
java/go/….
- Alacarte: use what you want
- Use the basics until you are ready to use
jobboards, or select a local engine until you are
ready to run remote workers…
Wherefore Taskflow? (cont.)
17. Notifications
Remote task workers
Dynamic flow modification
Real time dashboard of atom/flow/job transitions (WIP)
Applications that can be paused
DDOS your favorite site (joke)
The potential is nearly limitless!!
Wherefore Taskflow? (cont.)