Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Provenance Annotation and Analysis to Support Process Re-Computation

130 vues

Publié le

paper talk given at the IPAW 2018 conference.
paper is here: http://www.lamsade.dauphine.fr/~belhajjame/Program_files/pdf/s3_p1.pdf

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Provenance Annotation and Analysis to Support Process Re-Computation

  1. 1. Provenance Annotation and Analysis to Support Process Re-Computation Jacek Cała, Paolo Missier School of Computing Newcastle University, UK
  2. 2. Problem Outline • Consider process P, e.g. the following NGS pipeline [5]: [5] Cała, J., Marei, E., Xu, Y., Takeda, K., Missier, P.: Scalable and efficient whole-exome data processing using workflows on the cloud. Future Generation Computer Systems (Jan 2016).
  3. 3. Problem Outline • Only rarely P is a static entity. • Usually, a variety of elements in P change: • data dependencies, • software tools & dependencies, • [out of scope] the structure of P. • Changes in the elements of P => the need to update past P outcomes => the need for re-computation.
  4. 4. The Re-Computation Framework • To control the re-computation of processes • proposed earlier in [6]. • The core of the framework is the re-computation loop: [6] Cała, J., Missier, P.: Selective and recurring re-computation of Big Data analytics tasks: insights from a Genomics case study. Big Data Research (2018); in press.
  5. 5. Re-Computation Process • Here we consider a single pass of the loop: • And focus on the first step only (S1).
  6. 6. Preliminaries • The ProvONE model: prospective + retrospective provenance [7]. • Set of software and data dependencies: D ={a0, b0, …} • Process, input and execution configuration: E(P, x,V) • Version change event: C = {an → an-1} • Composite version change event: C = {an → an-1, bm → bm-1, …} • Change front. • Re-computation front. • Restart tree. [7] Cuevas-Vicenttín, V., Ludäscher, B., Missier, P., et al.: ProvONE: A PROV Extension Data Model for Scientific Workflow Provenance (2016).
  7. 7. Change Front • The accumulation of change events over a specified time window. t C0 {a1 → a0} CF3 {a3, b1, c2} CF5 {a3, b2, c2, d1} C1 {b1 → b0} C3 {a3 → a2, c2 → c1} C4 {d1 → d0} C5 {b2 → b1} C2 {a2 → a1, c1 → c0} E(…, [a0, b0, e0]) E(…, [a0, b1, d0]) E(…, [a2, b1, c1])
  8. 8. Re-computation Front • Over time the population of executions grow • Some of them may result from re-executions • Some of them may be user-initiated • may use historical versions of elements • Looking for the transitive closure of the elements’ derivation is too broad. => find out which of the past executions really need an update.
  9. 9. Re-computation Front We use: wasInformedBy(..., [prov:type=“recomp:re-execution”]) to denote a ReComp-initiated re-execution.
  10. 10. Re-computation Front
  11. 11. Re-computation Front … user-initiated
  12. 12. Re-computation Front
  13. 13. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
  14. 14. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
  15. 15. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
  16. 16. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
  17. 17. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.  The provenance trace includes multiple interrelated executions.  During re-execution we have to combine all of them within a single context – the top-level execution.
  18. 18. Restart Tree • To build a restart tree we rely on the proveone:wasPartOf statements. CF = {b2, e1}
  19. 19. Restart Tree • Captures the vertical dimension of a single execution • the transitive closure of the wasPartOf relation. RT ≝ {Execution, [DataChange], [Children]} CF = {b2, e1}
  20. 20. Restart Tree • Captures the vertical dimension of a single execution • the transitive closure of the wasPartOf relation. RT = {E0, [], [{SE0, [], [{SSE1, [⟨b2 → b0⟩], []},{SSE3, [⟨e1 → e0⟩], []}]}, {SE1, …}, …]} CF = {b2, e1}
  21. 21. The algorithm • Combines all three aspects: • the change front, • the re-computation front and • the restart tree. • For a given change front, –> produces the recomputation front that –> includes a set of restart trees, –> each refers to a single top-level execution with only the parts related to the change(s). • Enables ReComp to identify the minimal set of executions that may be affected by the change(s) • The remaining executions are either unaffected at all or refreshed previously.
  22. 22. Re-Computation Process • Enables difference and impact analysis of the executions on the front and their partial re-execution.
  23. 23. Difference and Impact Analysis <<hasSubProgram>> <<hasSubProgram>>
  24. 24. Conclusions • We address the problem of the re-computation of: • complex hierarchical processes, • run over a cohort of input data samples, • with multiple points of change, • in the open system – allow users to initiate (re-)executions any time. • The solution starts from the changes observed: • In contrast to previous work, e.g. smart re-run and workflow caching. • We proposed a simple algorithm to find the re-computation front: • written in Prolog, • very effective (response in the order of 1–100 ms), • available on GitHub. • The algorithm is the initial step in further scope identification and execution optimisation.
  25. 25. Thank you! http://www.recomp.org.uk

×