Kostas Tzoumas - Stream Processing with Apache Flink®

1
Kostas Tzoumas
@kostas_tzoumas
Big Data Ldn
November 4, 2016
Stream Processing with Apache
Flink®

2
Kostas Tzoumas
@kostas_tzoumas
Big Data Ldn
November 4, 2016
Debunking Some Common Myths in
Stream Processing

3
Original creators of Apache
Flink®
Providers of the
dA Platform, a supported
Flink distribution

Outline
 What is data streaming
 Myth 1: The throughput/latency tradeoff
 Myth 2: Exactly once not possible
 Myth 3: Streaming is for (near) real-time
 Myth 4: Streaming is hard
4

6
Reconsideration of data architecture
 Better app isolation
 More real-time reaction to events
 Robust continuous applications
 Process both real-time and historical data

7
app state
app state
app state
event log
Query
service

What is (distributed) streaming
 Computations on never-
ending “streams” of data
records (“events”)
 Stream processor
distributes the
computation in a cluster
8
Your
code
Your
code
Your
code
Your
code

What is stateful streaming
 Computation and state
• E.g., counters, windows of past
events, state machines, trained ML
models
 Result depends on history of
stream
 Stateful stream processor gives
the tools to manage state
• Recover, roll back, version,
upgrade, etc
9
Your
code
state

What is event-time streaming
 Data records associated with
timestamps (time series data)
 Processing depends on timestamps
 Event-time stream processor gives
you the tools to reason about time
• E.g., handle streams that are out of
order
• Core feature is watermarks – a clock
to measure event time
10
Your
code
state
t3 t1 t2t4 t1-t2 t3-t4

What is streaming
 Continuous processing on data that is
continuously generated
 I.e., pretty much all “big” data
 It’s all about state and time
11

Debunking some common stream
processing myths
12

Myth 1: Throughput/latency tradeoff
 Myth 1: you need to choose between high
throughput or low latency
 Physical limits
• In reality, network determines both the achievable
throughput and latency
• A well-engineered system achieves these limits
13

Flink performance
 10s of millions events per seconds in 10s of nodes
 scaled to 1000s of nodes
 with latency in single-digit milliseconds
14

Myth 2: Exactly once not possible
 Exactly once: under failures, system computes result
as if there was no failure
 In contrast to:
• At most once: no guarantees
• At least once: duplicates possible
 Exactly once state versus exactly once delivery
 Myth 2: Exactly once state not possible/too costly
15

Transactions
 “Exactly once” is transactions: either all
actions succeed or none succeed
 Transactions are possible
 Transactions are useful
 Let’s not start eventual consistency all over
again…
16

Flink checkpoints
 Periodic asynchronous consistent snapshots of
application state
 Provide exactly-once state guarantees under failures
17
9/2/2016 stream_barriers.svg
checkpoint
barrier n1
data stream
stream record
(event)
checkpoint
barrier n
newer records
part of
checkpoint n1
part of
checkpoint n
part of
checkpoint n+1
older records

End-to-end exactly once
 Checkpoints double as transaction coordination mechanism
 Source and sink operators can take part in checkpoints
 Exactly once internally, "effectively once" end to end: e.g.,
Flink + Cassandra with idempotent updates
18
transactional sinks

State management
 Checkpoints triple as state
versioning mechanism
(savepoints)
 Go back and forth in time while
maintaining state consistency
 Ease code upgrades (Flink or
app), maintenance, migration,
and debugging, what-if
simulations, A/B tests
19

Myth 3: Streaming and real time
 Myth 3: streaming and real-time are
synonymous
 Streaming is a new model
• Essentially, state and time
• Low latency/real time is the icing on the cake
20

Low latency and high latency streams
21
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition
Stream (low latency)
Batch
(bounded stream)
Stream (high latency)

Robust continuous applications
22

Accurate computation
 Batch processing is not an accurate
computation model for continuous data
• Misses the right concepts and primitives
• Time handling, state across batch boundaries
 Stateful stream processing a better model
• Real-time/low-latency is the icing on the cake
23

Myth 4: How hard is streaming?
 Myth 4: streaming is too hard to learn
 You are already doing streaming, just in an
ad hoc way
 Most data is unbounded and the code
changes slower than the data
• This is a streaming problem
24

It's about your data and code
 What's the form of your data?
• Unbounded (e.g., clicks, sensors, logs), or
• Bounded (e.g., ???*)
 What changes more often?
• My code changes faster than my data
• My data changes faster than my code
25
* Please help me find a great example of naturally bounded data

 If your data changes faster than your code
you have a streaming problem
• You may be solving it with hourly batch jobs
depending on someone else to create the
hourly batches
• You are probably living with inaccurate results
without knowing it
26

 If your code changes faster than your data
you have an exploration problem
• Using notebooks or other tools for quick data
exploration is a good idea
• Once your code stabilizes you will have a
streaming problem, so you might as well think
of it as such from the beginning
27

Flink community
 > 240 contributors, 95 contributors in Flink 1.1
 42 meetups around the world with > 15,000 members
 2x-3x growth in 2015, similar in 2016
29

Powered by Flink
30
Zalando, one of the largest ecommerce
companies in Europe, uses Flink for real-
time business process monitoring.
King, the creators of Candy Crush Saga,
uses Flink to provide data science teams
with real-time analytics.
Bouygues Telecom uses Flink for real-time
event processing over billions of Kafka
messages per day.
Alibaba, the world's largest retailer, built a
Flink-based system (Blink) to optimize
search rankings in real time.
See more at flink.apache.org/poweredby.html

30 Flink applications in production for more than one
year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7,
processing 30 billion events daily, maintaining state
of 100s of GB with exactly-once guarantees
Largest job has > 20 operators, runs on > 5000
vCores in 1000-node cluster, processes millions of
events per second
31

Ongoing Flink development
35
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Operations
Ecosystem
Application
Features
Metrics &
Visualization
Dynamic Scaling
Savepoint
compatibility Checkpoints
to savepoints
More connectors Stream SQL
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Broader
Audience
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State

A longer-term vision for Flink
36

Streaming use cases
Application
(Near) real-time apps
Continuous apps
Analytics on historical
data
Request/response apps
Technology
Low-latency streaming
High-latency streaming
Batch as special case of
streaming
Large queryable state
37

Request/response applications
 Queryable state: query Flink state directly instead
of pushing results in a database
 Large state support and query API coming in Flink
38
queries

In summary
 The need for streaming comes from a rethinking of
data infra architecture
• Stream processing then just becomes natural
 Debunking 4 common myths
• Myth 1: The throughput/latency tradeoff
• Myth 2: Exactly once not possible
• Myth 3: Streaming is for (near) real-time
• Myth 4: Streaming is hard
39

4
Thank you!
@kostas_tzoumas
@ApacheFlink
@dataArtisans

4
We are hiring!
data-artisans.com/careers

Kostas Tzoumas - Stream Processing with Apache Flink®

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Kostas Tzoumas - Stream Processing with Apache Flink®

Similaire à Kostas Tzoumas - Stream Processing with Apache Flink® (20)

Plus de Ververica

Plus de Ververica (9)

Dernier

Dernier (20)

Kostas Tzoumas - Stream Processing with Apache Flink®