Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
The Power of Snapshots
Stateful Stream Processing
with Apache Flink
Stephan Ewen
QCon San Francisco, 2017
1
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every...
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practi...
2
Original creators of
Apache Flink®
dA Platform 2
Open Source Apache Flink +
dA Application Manager
3
Stream Processing
What changes faster? Data or Query?
4
Data changes slowly
compared to fast
changing queries
ad-hoc queries, data explorati...
Batch Processing
5
Stream Processing
6
7
Stateful
Stream Processing
Moving State into the Processors
8
Application
External DBstate
Stateless
Stream Processor
Stateful
Stream Processor
Appli...
9
Apache Flink
Apache Flink in a Nutshell
10
Queries
Applications
Devices
etc.
Database
Stream
File / Object
Storage
Stateful computation...
11
Event Streams State (Event) Time Snapshots
The Core Building Blocks
real-time and
hindsight
complex
business logic
cons...
Stateful Event & Stream Processing
12
Source
Transformation
Transformation
Sink
val lines: DataStream[String] = env.addSou...
Stateful Event & Stream Processing
13
Scalable embedded state
Access at memory speed &
scales with parallel operators
Event time and Processing Time
14
Event Producer Message Queue
Flink
Data Source
Flink
Window Operator
partition 1
partiti...
Powerful Abstractions
15
Process Function (events, state, time)
DataStream API (streams, windows)
Stream SQL / Tables (dyn...
16
Distributed Snapshots
Event Sourcing + Memory Image
17
event log
persists events
(temporarily)
event /
command
Process
main memory
update local
...
Event Sourcing + Memory Image
18
Recovery: Restore snapshot and replay events
since snapshot
event log
persists events
(te...
Consistent Distributed Snapshots
19
Scalable embedded state
Access at memory speed &
scales with parallel operators
Checkpoint Barriers
20
Consistent Distributed Snapshots
21
Trigger checkpoint Inject checkpoint barrier
Consistent Distributed Snapshots
22
Take state snapshot Trigger state
copy-on-write
Consistent Distributed Snapshots
23
Persist state snapshots Persist
snapshots
asynchronously
Processing pipeline continues
Consistent Distributed Snapshots
25
Re-load state
Reset positions
in input streams
Rolling back computation
Re-processing
Consistent Distributed Snapshots
26
Restore to different
programs
27
Checkpoints and Savepoints
in Apache Flink
Speed or Operability?
28
Fast snapshots
Checkpoint
Flexible
Operations on
Snapshots
Savepoint
What to optimize for?
Savepoints: Opt. for Operability
 Self contained: No references to other checkpoints
 Canonical format: Switch between s...
Checkpoints: Opt. for Efficiency
 Often incremental:
• Snapshot only diff from last snapshot
• Reference older snapshots,...
31
What else are snapshots /
checkpoints good for?
What users built on checkpoints
 Upgrades and Rollbacks
 Cross Datacenter Failover
 State Archiving
 State Bootstrappi...
33
Distributed Snapshots
and side effects
Transaction coordination for side fx
34
One snapshot can transactionally move
data between different systems
Snapshots may...
Transaction coordination for side fx
 Similar to a distributed 2-phase commit
 Coordinated by asynchronous checkpoints, ...
36
Distributed Snapshots
and Application Architectures
(A Philosophical Monologue)
Good old centralized architecture
37
The big mean
central database
$$$
The grumpy
DBA
Application Application Application ...
Stateful Stream Proc. & Applications
38
Application Application Application
Application Application
decentralized infrastr...
Stateless Application Containers
39
State management
is nasty, let's pretend we don't
have to do it
Stateless Application Containers
40
Kudos to Kiki Carter
for the Broccoli
Metaphor
Broccoli (state management)
is nasty, l...
Stateful Stream Proc. to the rescue
41
Application
Sensor
APIs
Application
Application
Application
very simple: state is j...
Compute, State, and Storage
42
Classic tiered architecture Streaming architecture
database
layer
compute
layer
application...
Performance
43
synchronous reads/writes
across tier boundary
asynchronous writes
of large blobs
all modifications
are loca...
Consistency
44
distributed transactions
at scale typically
at-most / at-least once
exactly once
per state =1 =1
Classic ti...
Scaling a Service
45
separately provision additional
database capacity
provision compute
and state together
Classic tiered...
Rolling out a new Service
46
provision a new database
(or add capacity to an existing one)
simply occupies some
additional...
Time, Completeness, Out-of-order
47
?
event time clocks
define data completeness
event time timers
handle actions for
out-...
Stateful Stream Processing
48
Application
Sensor
APIs
Application
Application
Application
very simple: state is just part
...
The Challenges with that:
 Upgrades are stateful, need consistency
• application evolution and bug fixes
 Migration of a...
50
Consistent Distributed
Snapshots
The answer
(my personal and obviously biased take)
51
Payments Dashboard
Demo Time!
52
Thank you very much 
(shameless plug)
We are hiring!
data-artisans.com/careers
Appendix
54
55
Details about Snapshots
and Transactional
Side Effects
Exactly-once via Transactions
56
chk-1 chk-2
TXN-1
✔chk-1 ✔chk-2
TXN-2
✘
TXN-3
Side effect
✔ global ✔ global
Transaction fails after local snapshot
57
chk-1 chk-2
TXN-1
✔chk-1
TXN-2
✘
TXN-3
✔ global
Side effect
Transaction fails before commit…
58
chk-1 chk-2
TXN-1
✔chk-1
TXN-2
✘
TXN-3
✔ global ✔ global
Side effect
… commit on recovery
59
chk-2
TXN-2 TXN-3
✔ global
recover
TXN handle
chk-3
Side effect
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
distributed-stream-processin...
Prochain SlideShare
Chargement dans…5
×

The Power of Distributed Snapshots in Apache Flink

444 vues

Publié le

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2FcTsIA.

Stephan Ewen talks about how Apache Flink handles stateful stream processing and how to manage distributed stream processing and data driven applications efficiently with Flink's checkpoints and savepoints. Filmed at qconsf.com.

Stephan Ewen is a PMC member and one of the original creators of Apache Flink, and co-founder and CTO of data Artisans.

Publié dans : Technologie
  • Soyez le premier à commenter

The Power of Distributed Snapshots in Apache Flink

  1. 1. The Power of Snapshots Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1
  2. 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ distributed-stream-processing-flink
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  4. 4. 2 Original creators of Apache Flink® dA Platform 2 Open Source Apache Flink + dA Application Manager
  5. 5. 3 Stream Processing
  6. 6. What changes faster? Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter tuning Batch Processing Use Case Data changes fast application logic is long-lived continuous applications, data pipelines, standing queries, anomaly detection, ML evaluation, … Stream Processing Use Case
  7. 7. Batch Processing 5
  8. 8. Stream Processing 6
  9. 9. 7 Stateful Stream Processing
  10. 10. Moving State into the Processors 8 Application External DBstate Stateless Stream Processor Stateful Stream Processor Application state
  11. 11. 9 Apache Flink
  12. 12. Apache Flink in a Nutshell 10 Queries Applications Devices etc. Database Stream File / Object Storage Stateful computations over streams real-time and historic fast, scalable, fault tolerant, in-memory, event time, large state, exactly-once Historic Data Streams Application
  13. 13. 11 Event Streams State (Event) Time Snapshots The Core Building Blocks real-time and hindsight complex business logic consistency with out-of-order data and late data forking / versioning / time-travel
  14. 14. Stateful Event & Stream Processing 12 Source Transformation Transformation Sink val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Streaming Dataflow Source Transform Window (state read/write) Sink
  15. 15. Stateful Event & Stream Processing 13 Scalable embedded state Access at memory speed & scales with parallel operators
  16. 16. Event time and Processing Time 14 Event Producer Message Queue Flink Data Source Flink Window Operator partition 1 partition 2 Event Time Ingestion Time Processing Time Broker Time Event time, Watermarks, as in the Dataflow model
  17. 17. Powerful Abstractions 15 Process Function (events, state, time) DataStream API (streams, windows) Stream SQL / Tables (dynamic tables) Stream- & Batch Data Processing High-level Analytics API Stateful Event- Driven Applications val stats = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum((a, b) -> a.add(b)) def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = { // work with event and state (event, state.value) match { … } out.collect(…) // emit events state.update(…) // modify state // schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } Layered abstractions to navigate simple to complex use cases
  18. 18. 16 Distributed Snapshots
  19. 19. Event Sourcing + Memory Image 17 event log persists events (temporarily) event / command Process main memory update local variables/structures periodically snapshot the memory
  20. 20. Event Sourcing + Memory Image 18 Recovery: Restore snapshot and replay events since snapshot event log persists events (temporarily) Process
  21. 21. Consistent Distributed Snapshots 19 Scalable embedded state Access at memory speed & scales with parallel operators
  22. 22. Checkpoint Barriers 20
  23. 23. Consistent Distributed Snapshots 21 Trigger checkpoint Inject checkpoint barrier
  24. 24. Consistent Distributed Snapshots 22 Take state snapshot Trigger state copy-on-write
  25. 25. Consistent Distributed Snapshots 23 Persist state snapshots Persist snapshots asynchronously Processing pipeline continues
  26. 26. Consistent Distributed Snapshots 25 Re-load state Reset positions in input streams Rolling back computation Re-processing
  27. 27. Consistent Distributed Snapshots 26 Restore to different programs
  28. 28. 27 Checkpoints and Savepoints in Apache Flink
  29. 29. Speed or Operability? 28 Fast snapshots Checkpoint Flexible Operations on Snapshots Savepoint What to optimize for?
  30. 30. Savepoints: Opt. for Operability  Self contained: No references to other checkpoints  Canonical format: Switch between state structures  Efficiently re-scalable: Indexed by key group  Future: More self-describing serialization format for to archiving / versioning (like Avro, Thrift, etc.) 29
  31. 31. Checkpoints: Opt. for Efficiency  Often incremental: • Snapshot only diff from last snapshot • Reference older snapshots, compaction over time  Format specific to state backend: • No extra copied or re-encoding • Not possible to switch to another state backend between checkpoints  Compact serialization: Optimized for speed/space, not long term archival and evolution  Key goups not indexed: Re-distribution may be more expensive 30
  32. 32. 31 What else are snapshots / checkpoints good for?
  33. 33. What users built on checkpoints  Upgrades and Rollbacks  Cross Datacenter Failover  State Archiving  State Bootstrapping  Application Migration  Spot Instance Region Arbitrage  A/B testing  … 32
  34. 34. 33 Distributed Snapshots and side effects
  35. 35. Transaction coordination for side fx 34 One snapshot can transactionally move data between different systems Snapshots may include side effects
  36. 36. Transaction coordination for side fx  Similar to a distributed 2-phase commit  Coordinated by asynchronous checkpoints, no voting delays  Basic algorithm: • Between checkpoints: Produce into transaction or Write Ahead Log • On operator snapshot: Flush local transaction (vote-to-commit) • On checkpoint complete: Commit transactions (commit) • On recovery: check and commit any pending transactions 35
  37. 37. 36 Distributed Snapshots and Application Architectures (A Philosophical Monologue)
  38. 38. Good old centralized architecture 37 The big mean central database $$$ The grumpy DBA Application Application Application Application
  39. 39. Stateful Stream Proc. & Applications 38 Application Application Application Application Application decentralized infrastructure DevOps decentralized responsibilities still involves managing databases
  40. 40. Stateless Application Containers 39 State management is nasty, let's pretend we don't have to do it
  41. 41. Stateless Application Containers 40 Kudos to Kiki Carter for the Broccoli Metaphor Broccoli (state management) is nasty, let's pretend we don't have to eat do it
  42. 42. Stateful Stream Proc. to the rescue 41 Application Sensor APIs Application Application Application very simple: state is just part of the application
  43. 43. Compute, State, and Storage 42 Classic tiered architecture Streaming architecture database layer compute layer application state + backup compute + stream storage and snapshot storage (backup) application state
  44. 44. Performance 43 synchronous reads/writes across tier boundary asynchronous writes of large blobs all modifications are local Classic tiered architecture Streaming architecture
  45. 45. Consistency 44 distributed transactions at scale typically at-most / at-least once exactly once per state =1 =1 Classic tiered architecture Streaming architecture
  46. 46. Scaling a Service 45 separately provision additional database capacity provision compute and state together Classic tiered architecture Streaming architecture provision compute
  47. 47. Rolling out a new Service 46 provision a new database (or add capacity to an existing one) simply occupies some additional backup space Classic tiered architecture Streaming architecture provision compute and state together
  48. 48. Time, Completeness, Out-of-order 47 ? event time clocks define data completeness event time timers handle actions for out-of-order data Classic tiered architecture Streaming architecture
  49. 49. Stateful Stream Processing 48 Application Sensor APIs Application Application Application very simple: state is just part of the application
  50. 50. The Challenges with that:  Upgrades are stateful, need consistency • application evolution and bug fixes  Migration of application state • cluster migration, A/B testing  Re-processing and reinstatement • fix corrupt results, bootstrap new applications  State evolution (schema evolution) 49
  51. 51. 50 Consistent Distributed Snapshots The answer (my personal and obviously biased take)
  52. 52. 51 Payments Dashboard Demo Time!
  53. 53. 52 Thank you very much  (shameless plug)
  54. 54. We are hiring! data-artisans.com/careers
  55. 55. Appendix 54
  56. 56. 55 Details about Snapshots and Transactional Side Effects
  57. 57. Exactly-once via Transactions 56 chk-1 chk-2 TXN-1 ✔chk-1 ✔chk-2 TXN-2 ✘ TXN-3 Side effect ✔ global ✔ global
  58. 58. Transaction fails after local snapshot 57 chk-1 chk-2 TXN-1 ✔chk-1 TXN-2 ✘ TXN-3 ✔ global Side effect
  59. 59. Transaction fails before commit… 58 chk-1 chk-2 TXN-1 ✔chk-1 TXN-2 ✘ TXN-3 ✔ global ✔ global Side effect
  60. 60. … commit on recovery 59 chk-2 TXN-2 TXN-3 ✔ global recover TXN handle chk-3 Side effect
  61. 61. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ distributed-stream-processing-flink

×