SlideShare a Scribd company logo
1 of 44
2017-18
Apache Flink
Meetup
Year Review
CONTENTS
• The Apache Flink Meetup Community
• What is Apache Flink?
• The Dataflow Programming Model
• Who is using Apache Flink?
• Last Year’s Talks (2017-2018)
• What’s new with Flink v.1.6.0 & What’s in store?
• Upcoming Meetups & more
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
THE APACHE FLINK
MEETUP
COMMUNITY
THE APACHE FLINK MEETUP COMMUNITY
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
4
• … around since 2016
• … a group of enthusiasts, excited about Flink’s potential
• … since then we have successfully run 17 meetups
• … sponsors: Data Reply UK
• … size of the community?
THE APACHE FLINK MEETUP COMMUNITY
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
5
• 500+ members!
• ~steady growth rate
• volatile active participation
WHAT IS APACHE
FLINK?
WHAT IS APACHE FLINK?
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
7
Apache Flink is a framework and distributed processing engine for
stateful computations over unbounded and bounded data streams
WHAT IS APACHE FLINK?
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
8
Apache Flink is a framework and distributed processing engine for
stateful computations over unbounded and bounded data streams
• … provides a standardised way to build and deploy applications.
WHAT IS APACHE FLINK?
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
9
Apache Flink is a framework and distributed processing engine for
stateful computations over unbounded and bounded data streams
• … a computer system (cluster) that uses more than
one computers to run an application.
WHAT IS APACHE FLINK?
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
10
Apache Flink is a framework and distributed processing engine for
stateful computations over unbounded and bounded data streams
• … this won’t be a single sentence!
WHAT IS APACHE FLINK?
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
11
Apache Flink is a framework and distributed processing engine for
stateful computations over unbounded and bounded data streams
• … this won’t be a single sentence!
STATEFUL VS STATELESS COMPUTATIONS
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
12
State in stream processing is as memory in operators:
• remembers information about past input;
• can be used to influence the processing of future input;
• … quite like a Markov Chain
STATEFUL VS STATELESS COMPUTATIONS
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
13
Stateless Example:
• Consider a source stream that emits events with schema:
e = {event_id:int, event_value:int}
• Our goal is, for each event, to extract and output the event_value.
STATEFUL VS STATELESS COMPUTATIONS
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
14
Stateless Example:
• Consider a source stream that emits events with schema:
e = {event_id:int, event_value:int}
• Our goal is, output the event_value only if it is larger than the value from the previous
event.
State
WHAT IS APACHE FLINK?
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
15
Apache Flink is a framework and distributed processing engine for
stateful computations over unbounded and bounded data streams
• … memory in operators
WHAT IS APACHE FLINK?
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
16
Apache Flink is a framework and distributed processing engine for
stateful computations over unbounded and bounded data streams
WHAT IS APACHE FLINK?
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
17
Flink core is a streaming data flow
engine that provides:
• data distribution,
• communication, and;
• fault tolerance;
for distributed computations over
data streams
THE DATAFLOW
PROGRAMMING
MODEL
LEVELS OF ABSTRACTION
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
19
• Flink offers different levels of abstraction to develop streaming/batch
applications.
PROGRAMS &
DATAFLOWS
The basic building blocks of Flink
programs are:
• streams and;
• transformations.
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
20
PARALLEL
DATAFLOWS
• Programs in Flink are inherently
parallel and distributed.
• During execution, a stream has one
or more stream partitions, and
each operator has one or
more operator subtasks.
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
21
WINDOWS
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
22
• Aggregating events (e.g., counts, sums) works differently on streams than in batch
processing.
• Data is not bounded so we need windows.
• Windows can be time driven (example: every 30 seconds) or data driven (example:
every 100 elements).
Types of windows:
• tumbling windows (no
overlap),
• sliding windows (with
overlap),and;
• session windows (punctuated
by a gap of inactivity).
TIME
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
23
Different notions of
time:
• Event Time
• Ingestion Time
• Processing Time
STATEFUL OPERATIONS
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
24
• Some operations in a dataflow simply look at
one individual event at a time.
• Others operations remember information
across multiple events (for example window
operators). These operations are
called stateful.
• The state of stateful operations is maintained
in what can be thought of as an embedded
key/value store.
WHO IS USING
APACHE FLINK?
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
26 • Alibaba, the world's largest retailer,
uses a fork of Flink called Blink to
optimize search rankings in real time.
• Ebay's monitoring platform is
powered by Flink and evaluates
thousands of customizable alert rules
on metrics and log streams.
• Huawei is a leading global provider
of ICT infrastructure and smart
devices. Huawei Cloud provides
Cloud Service based on Flink.
• Uber built their internal SQL-based,
open-source streaming analytics
platform AthenaX on Apache Flink.
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
27
Apache Flink® user survey by dataArtisans
• Enterprises are investing heavily in stream
processing technology
• 87% planning to deploy more applications
powered by Apache Flink software in 2018
• 64% Machine Learning
• 34% Model Scoring
• 30% Model Training
• 27% Anomaly Detection/System Monitoring
• 25% Business Intelligence
“… the ability to react to data in the moment is
becoming a top priority among enterprises of all
sizes”
LAST YEAR’S TALKS
(2017-2018)
LAST YEAR’S TALKS (2017-18)
• Aris Koliopoulos & Alex Garella – “Panta Rhei: designing distributed
applications with streams.”
• Patrick Lucas, giving a lightning talk on “Best practices around Flink state types
(List/Map/ValueState etc).”
• Stavros Kontopoulos with “Let’s talk ML on Flink”
• Stephan Ewen (CTO & CO-Founder of Data Artisans), presenting “Stream SQL
and Realtime Applications with Apache Flink”
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
29
PANTA RHEI: DESIGNING DISTRIBUTED APPLICATIONS
WITH STREAMS
ARIS KOLIOPOULOS & ALEX GARELLA
• DriveTribe: a digital automotive community platform
founded by, and featuring content from The Grand Tour
presenters
• Users consume feeds and interact with a variety of
content: videos, images, articles
• Problem: They wanted a scalable way to produced
personalised rankings of articles for users.
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
30
PANTA RHEI: DESIGNING DISTRIBUTED APPLICATIONS
WITH STREAMS
ARIS KOLIOPOULOS & ALEX GARELLA
What they tried:
1. Stored data into a DB store and computed the aggregate stores on the fly
o Was very slow (high read time) and didn’t scale.
2. Tried computing aggregations at write time with the intention of reducing read time:
 1 write can fetch all views at once
o Not fault tolerant; one read fails they all fail.
o What about state mutations on the read data?
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
31
PANTA RHEI: DESIGNING DISTRIBUTED APPLICATIONS
WITH STREAMS
ARIS KOLIOPOULOS & ALEX GARELLA
Solution: Treat event streams as source of truth for applications—a powerful alternative
to using RPCs, Enterprise Messaging or a Shared Database to communicate and share
data across different applications or microservices
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
32
1. Clients send events to the API
(John liked Jeremy’s post)
2. Events are immutable; they
capture a certain action at some
point in time
3. Every application state instance
can be modelled as a projection
of those events
BEST PRACTICES AROUND FLINK STATE TYPES
(LIST/MAP/VALUESTATE)
PATRICK LUCAS
Different types of Managed
States:
• ValueState<T>
• ListState<T>
• ReducingState<T>
• AggregatingState<IN, OUT>
• FoldingState<T, ACC>
• MapState<UK, UV>
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
33
• “The cost of very frequent updates
(serialization/deserialisation)” … illustrated how we
can use of transient variables to do that.
• “When to use ReduceState vs AggregatingState
vs FoldingState?”
Also, discussed the beta version of Queryable State.
LET’S TALK ML ON FLINK
STAVROS KONTOPOULOS
• How about running model serving
“natively”, inside Flink server?
• How? Use dynamically controlled
stream approach—models are
delivered to running
implementation via model’s
stream and dynamically
instantiated for usage.
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
34
Proposition: Build a streaming system allowing to update models without interruption of
execution
STREAM SQL AND REALTIME APPLICATIONS WITH
APACHE FLINK
STEPHAN EWEN (CTO & CO-FOUNDER OF DATA ARTISANS)
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
35
SQL was not designed for streams:
• Relations are bounded (multi-)sets while streams are infinite
sequences
• DBMS can access all data while streaming data arrives over time
• SQL queries return a result and end while streaming queries
continuously emit results and never end
STREAM SQL AND REALTIME APPLICATIONS WITH
APACHE FLINK
STEPHAN EWEN (CTO & CO-FOUNDER OF DATA ARTISANS)
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
36
DBMS run queries on streams all the time!
• Materialised Views (MVs) are used to speed up analytical queries
• They need to update when tables change
• MV maintenance is very similar to MVs:
• Table updates are a stream of statements
• MV definitions (queries) are evaluated (continuously) on that stream
STREAM SQL AND REALTIME APPLICATIONS WITH
APACHE FLINK
STEPHAN EWEN (CTO & CO-FOUNDER OF DATA ARTISANS)
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
37
What about windows?
WHAT’S NEW WITH
FLINK V.1.6.0
& WHAT’S IN STORE?
WHAT’S NEW WITH FLINK V.1.6.0
• Simplifying Apache Flink’s state with the addition of
native support for state TTL.
• Further improvements to the Streaming SQL CLI, including
simplifying the executions of streaming and batch
queries against different data sources
• Improved Flink connectors allowing better integration with
external systems.
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
39
WHAT’S IN STORE?
• Integration of SQL and CEP
• Unified checkpoints and savepoints
• An improved Flink deployment and process model
• Fine-grained recovery from task failures
• An SQL Client to execute SQL queries against batch and streaming tables.
• Serving of machine learning models.
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
40
UPCOMING
MEETUPS & MORE
CLICKSTREAM PROCESSING AT THE FINANCIAL
TIMES
The Financial Times (FT) process millions of customer
events per day. The ability to monitor such events in real-
time is crucial for attracting new customers, monitoring
the popularity of articles and personalising experiences.
In this talk, the Flink team, will show us:
• how they use Flink to process their clickstream;
• how they operate the pipeline using Docker Swarm in
AWS;
• how they keep secrets safe using Vault, and;
• how they monitor it with Prometheus and Grafana.
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
42
LIGHTNING TALKS
• Give back to the community!
• Have an idea you want to discuss?
• Have done work you want to talk about?
• Found out about a new concept and what to present it?
Come, do a lightning talk!
15 mins of pure excitement & passion!
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
43
10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK

More Related Content

What's hot

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Impetus Technologies
 
Raising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace CenterRaising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace CenterAndreas Schreiber
 
A Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's DatasetsA Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's DatasetsOscar Corcho
 
Network and IT Operations
Network and IT OperationsNetwork and IT Operations
Network and IT OperationsNeo4j
 
Neo4j GraphTalks Oslo - Introduction to Graphs
Neo4j GraphTalks Oslo - Introduction to GraphsNeo4j GraphTalks Oslo - Introduction to Graphs
Neo4j GraphTalks Oslo - Introduction to GraphsNeo4j
 
Improving Response Times at Optum with Elastic APM
Improving Response Times at Optum with Elastic APMImproving Response Times at Optum with Elastic APM
Improving Response Times at Optum with Elastic APMElasticsearch
 
ETL – Everything you need to know
ETL – Everything you need to knowETL – Everything you need to know
ETL – Everything you need to knowAdi Polak
 
Renault: A Data Lake Journey
Renault: A Data Lake JourneyRenault: A Data Lake Journey
Renault: A Data Lake JourneyDataWorks Summit
 
IoT Reference Architectures
IoT Reference ArchitecturesIoT Reference Architectures
IoT Reference ArchitecturesBob Marcus
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software OverviewKNIMESlides
 
NetApp Cloud Storage Facts
NetApp Cloud Storage FactsNetApp Cloud Storage Facts
NetApp Cloud Storage FactsNetApp Insight
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift leftLars Albertsson
 
Kevin O'Sullivan, SITA Lab, presents at SITA 2013 Europe Aviation ICT Forum
Kevin O'Sullivan, SITA Lab, presents at SITA 2013 Europe Aviation ICT ForumKevin O'Sullivan, SITA Lab, presents at SITA 2013 Europe Aviation ICT Forum
Kevin O'Sullivan, SITA Lab, presents at SITA 2013 Europe Aviation ICT ForumSITA
 
Towards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoTTowards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoTHong-Linh Truong
 
Predictive Analytics: Why (I)IoT Is Different
Predictive Analytics: Why (I)IoT Is DifferentPredictive Analytics: Why (I)IoT Is Different
Predictive Analytics: Why (I)IoT Is DifferentAltoros
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Impetus Technologies
 
ML Production Pipelines: A Classification Model
ML Production Pipelines: A Classification ModelML Production Pipelines: A Classification Model
ML Production Pipelines: A Classification ModelDatabricks
 
ISO 18876
ISO 18876ISO 18876
ISO 18876lenand
 

What's hot (20)

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
 
Raising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace CenterRaising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace Center
 
A Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's DatasetsA Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's Datasets
 
DataXDay - Real-Time Access log analysis
DataXDay - Real-Time Access log analysis DataXDay - Real-Time Access log analysis
DataXDay - Real-Time Access log analysis
 
Network and IT Operations
Network and IT OperationsNetwork and IT Operations
Network and IT Operations
 
Neo4j GraphTalks Oslo - Introduction to Graphs
Neo4j GraphTalks Oslo - Introduction to GraphsNeo4j GraphTalks Oslo - Introduction to Graphs
Neo4j GraphTalks Oslo - Introduction to Graphs
 
NetApp By The Numbers
NetApp By The NumbersNetApp By The Numbers
NetApp By The Numbers
 
Improving Response Times at Optum with Elastic APM
Improving Response Times at Optum with Elastic APMImproving Response Times at Optum with Elastic APM
Improving Response Times at Optum with Elastic APM
 
ETL – Everything you need to know
ETL – Everything you need to knowETL – Everything you need to know
ETL – Everything you need to know
 
Renault: A Data Lake Journey
Renault: A Data Lake JourneyRenault: A Data Lake Journey
Renault: A Data Lake Journey
 
IoT Reference Architectures
IoT Reference ArchitecturesIoT Reference Architectures
IoT Reference Architectures
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
NetApp Cloud Storage Facts
NetApp Cloud Storage FactsNetApp Cloud Storage Facts
NetApp Cloud Storage Facts
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
 
Kevin O'Sullivan, SITA Lab, presents at SITA 2013 Europe Aviation ICT Forum
Kevin O'Sullivan, SITA Lab, presents at SITA 2013 Europe Aviation ICT ForumKevin O'Sullivan, SITA Lab, presents at SITA 2013 Europe Aviation ICT Forum
Kevin O'Sullivan, SITA Lab, presents at SITA 2013 Europe Aviation ICT Forum
 
Towards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoTTowards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoT
 
Predictive Analytics: Why (I)IoT Is Different
Predictive Analytics: Why (I)IoT Is DifferentPredictive Analytics: Why (I)IoT Is Different
Predictive Analytics: Why (I)IoT Is Different
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
ML Production Pipelines: A Classification Model
ML Production Pipelines: A Classification ModelML Production Pipelines: A Classification Model
ML Production Pipelines: A Classification Model
 
ISO 18876
ISO 18876ISO 18876
ISO 18876
 

Similar to Flink Meetup Septmeber 2017 2018

Enabling the digital thread using open OSLC standards
Enabling the digital thread using open OSLC standardsEnabling the digital thread using open OSLC standards
Enabling the digital thread using open OSLC standardsAxel Reichwein
 
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesJohn Mulhall
 
Domain Specific Languages for Parallel Graph AnalytiX (PGX)
Domain Specific Languages for Parallel Graph AnalytiX (PGX)Domain Specific Languages for Parallel Graph AnalytiX (PGX)
Domain Specific Languages for Parallel Graph AnalytiX (PGX)Eelco Visser
 
Running containers in production, the ING story
Running containers in production, the ING storyRunning containers in production, the ING story
Running containers in production, the ING storyThijs Ebbers
 
Meetup folien big data zu smart data
Meetup folien big data zu smart dataMeetup folien big data zu smart data
Meetup folien big data zu smart dataDavid Patrick Chang
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Compliance made easy: Lynx webinar #1
Compliance made easy: Lynx webinar #1Compliance made easy: Lynx webinar #1
Compliance made easy: Lynx webinar #1Lynx Project
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsDataconomy Media
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataStavros Kontopoulos
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thessaloniki
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Maya Lumbroso
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Dataconomy Media
 
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...Dataconomy Media
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemDan Eaton
 
Easy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance ProfessionalEasy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance ProfessionalMartin Kaltenböck
 
Unify Data at Memory Speed
Unify Data at Memory SpeedUnify Data at Memory Speed
Unify Data at Memory SpeedAlluxio, Inc.
 

Similar to Flink Meetup Septmeber 2017 2018 (20)

Enabling the digital thread using open OSLC standards
Enabling the digital thread using open OSLC standardsEnabling the digital thread using open OSLC standards
Enabling the digital thread using open OSLC standards
 
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation Slides
 
Domain Specific Languages for Parallel Graph AnalytiX (PGX)
Domain Specific Languages for Parallel Graph AnalytiX (PGX)Domain Specific Languages for Parallel Graph AnalytiX (PGX)
Domain Specific Languages for Parallel Graph AnalytiX (PGX)
 
Running containers in production, the ING story
Running containers in production, the ING storyRunning containers in production, the ING story
Running containers in production, the ING story
 
Meetup folien big data zu smart data
Meetup folien big data zu smart dataMeetup folien big data zu smart data
Meetup folien big data zu smart data
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Compliance made easy: Lynx webinar #1
Compliance made easy: Lynx webinar #1Compliance made easy: Lynx webinar #1
Compliance made easy: Lynx webinar #1
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx Systems
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
 
Blockchains and databases a new era in distributed computing
Blockchains and databases a new era in distributed computingBlockchains and databases a new era in distributed computing
Blockchains and databases a new era in distributed computing
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
 
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
Easy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance ProfessionalEasy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance Professional
 
Unify Data at Memory Speed
Unify Data at Memory SpeedUnify Data at Memory Speed
Unify Data at Memory Speed
 

Recently uploaded

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Recently uploaded (20)

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 

Flink Meetup Septmeber 2017 2018

  • 2. CONTENTS • The Apache Flink Meetup Community • What is Apache Flink? • The Dataflow Programming Model • Who is using Apache Flink? • Last Year’s Talks (2017-2018) • What’s new with Flink v.1.6.0 & What’s in store? • Upcoming Meetups & more 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
  • 4. THE APACHE FLINK MEETUP COMMUNITY 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 4 • … around since 2016 • … a group of enthusiasts, excited about Flink’s potential • … since then we have successfully run 17 meetups • … sponsors: Data Reply UK • … size of the community?
  • 5. THE APACHE FLINK MEETUP COMMUNITY 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 5 • 500+ members! • ~steady growth rate • volatile active participation
  • 7. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 7 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams
  • 8. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 8 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams • … provides a standardised way to build and deploy applications.
  • 9. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 9 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams • … a computer system (cluster) that uses more than one computers to run an application.
  • 10. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 10 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams • … this won’t be a single sentence!
  • 11. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 11 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams • … this won’t be a single sentence!
  • 12. STATEFUL VS STATELESS COMPUTATIONS 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 12 State in stream processing is as memory in operators: • remembers information about past input; • can be used to influence the processing of future input; • … quite like a Markov Chain
  • 13. STATEFUL VS STATELESS COMPUTATIONS 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 13 Stateless Example: • Consider a source stream that emits events with schema: e = {event_id:int, event_value:int} • Our goal is, for each event, to extract and output the event_value.
  • 14. STATEFUL VS STATELESS COMPUTATIONS 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 14 Stateless Example: • Consider a source stream that emits events with schema: e = {event_id:int, event_value:int} • Our goal is, output the event_value only if it is larger than the value from the previous event. State
  • 15. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 15 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams • … memory in operators
  • 16. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 16 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams
  • 17. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 17 Flink core is a streaming data flow engine that provides: • data distribution, • communication, and; • fault tolerance; for distributed computations over data streams
  • 19. LEVELS OF ABSTRACTION 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 19 • Flink offers different levels of abstraction to develop streaming/batch applications.
  • 20. PROGRAMS & DATAFLOWS The basic building blocks of Flink programs are: • streams and; • transformations. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 20
  • 21. PARALLEL DATAFLOWS • Programs in Flink are inherently parallel and distributed. • During execution, a stream has one or more stream partitions, and each operator has one or more operator subtasks. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 21
  • 22. WINDOWS 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 22 • Aggregating events (e.g., counts, sums) works differently on streams than in batch processing. • Data is not bounded so we need windows. • Windows can be time driven (example: every 30 seconds) or data driven (example: every 100 elements). Types of windows: • tumbling windows (no overlap), • sliding windows (with overlap),and; • session windows (punctuated by a gap of inactivity).
  • 23. TIME 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 23 Different notions of time: • Event Time • Ingestion Time • Processing Time
  • 24. STATEFUL OPERATIONS 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 24 • Some operations in a dataflow simply look at one individual event at a time. • Others operations remember information across multiple events (for example window operators). These operations are called stateful. • The state of stateful operations is maintained in what can be thought of as an embedded key/value store.
  • 26. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 26 • Alibaba, the world's largest retailer, uses a fork of Flink called Blink to optimize search rankings in real time. • Ebay's monitoring platform is powered by Flink and evaluates thousands of customizable alert rules on metrics and log streams. • Huawei is a leading global provider of ICT infrastructure and smart devices. Huawei Cloud provides Cloud Service based on Flink. • Uber built their internal SQL-based, open-source streaming analytics platform AthenaX on Apache Flink.
  • 27. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 27 Apache Flink® user survey by dataArtisans • Enterprises are investing heavily in stream processing technology • 87% planning to deploy more applications powered by Apache Flink software in 2018 • 64% Machine Learning • 34% Model Scoring • 30% Model Training • 27% Anomaly Detection/System Monitoring • 25% Business Intelligence “… the ability to react to data in the moment is becoming a top priority among enterprises of all sizes”
  • 29. LAST YEAR’S TALKS (2017-18) • Aris Koliopoulos & Alex Garella – “Panta Rhei: designing distributed applications with streams.” • Patrick Lucas, giving a lightning talk on “Best practices around Flink state types (List/Map/ValueState etc).” • Stavros Kontopoulos with “Let’s talk ML on Flink” • Stephan Ewen (CTO & CO-Founder of Data Artisans), presenting “Stream SQL and Realtime Applications with Apache Flink” 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 29
  • 30. PANTA RHEI: DESIGNING DISTRIBUTED APPLICATIONS WITH STREAMS ARIS KOLIOPOULOS & ALEX GARELLA • DriveTribe: a digital automotive community platform founded by, and featuring content from The Grand Tour presenters • Users consume feeds and interact with a variety of content: videos, images, articles • Problem: They wanted a scalable way to produced personalised rankings of articles for users. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 30
  • 31. PANTA RHEI: DESIGNING DISTRIBUTED APPLICATIONS WITH STREAMS ARIS KOLIOPOULOS & ALEX GARELLA What they tried: 1. Stored data into a DB store and computed the aggregate stores on the fly o Was very slow (high read time) and didn’t scale. 2. Tried computing aggregations at write time with the intention of reducing read time:  1 write can fetch all views at once o Not fault tolerant; one read fails they all fail. o What about state mutations on the read data? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 31
  • 32. PANTA RHEI: DESIGNING DISTRIBUTED APPLICATIONS WITH STREAMS ARIS KOLIOPOULOS & ALEX GARELLA Solution: Treat event streams as source of truth for applications—a powerful alternative to using RPCs, Enterprise Messaging or a Shared Database to communicate and share data across different applications or microservices 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 32 1. Clients send events to the API (John liked Jeremy’s post) 2. Events are immutable; they capture a certain action at some point in time 3. Every application state instance can be modelled as a projection of those events
  • 33. BEST PRACTICES AROUND FLINK STATE TYPES (LIST/MAP/VALUESTATE) PATRICK LUCAS Different types of Managed States: • ValueState<T> • ListState<T> • ReducingState<T> • AggregatingState<IN, OUT> • FoldingState<T, ACC> • MapState<UK, UV> 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 33 • “The cost of very frequent updates (serialization/deserialisation)” … illustrated how we can use of transient variables to do that. • “When to use ReduceState vs AggregatingState vs FoldingState?” Also, discussed the beta version of Queryable State.
  • 34. LET’S TALK ML ON FLINK STAVROS KONTOPOULOS • How about running model serving “natively”, inside Flink server? • How? Use dynamically controlled stream approach—models are delivered to running implementation via model’s stream and dynamically instantiated for usage. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 34 Proposition: Build a streaming system allowing to update models without interruption of execution
  • 35. STREAM SQL AND REALTIME APPLICATIONS WITH APACHE FLINK STEPHAN EWEN (CTO & CO-FOUNDER OF DATA ARTISANS) 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 35 SQL was not designed for streams: • Relations are bounded (multi-)sets while streams are infinite sequences • DBMS can access all data while streaming data arrives over time • SQL queries return a result and end while streaming queries continuously emit results and never end
  • 36. STREAM SQL AND REALTIME APPLICATIONS WITH APACHE FLINK STEPHAN EWEN (CTO & CO-FOUNDER OF DATA ARTISANS) 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 36 DBMS run queries on streams all the time! • Materialised Views (MVs) are used to speed up analytical queries • They need to update when tables change • MV maintenance is very similar to MVs: • Table updates are a stream of statements • MV definitions (queries) are evaluated (continuously) on that stream
  • 37. STREAM SQL AND REALTIME APPLICATIONS WITH APACHE FLINK STEPHAN EWEN (CTO & CO-FOUNDER OF DATA ARTISANS) 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 37 What about windows?
  • 38. WHAT’S NEW WITH FLINK V.1.6.0 & WHAT’S IN STORE?
  • 39. WHAT’S NEW WITH FLINK V.1.6.0 • Simplifying Apache Flink’s state with the addition of native support for state TTL. • Further improvements to the Streaming SQL CLI, including simplifying the executions of streaming and batch queries against different data sources • Improved Flink connectors allowing better integration with external systems. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 39
  • 40. WHAT’S IN STORE? • Integration of SQL and CEP • Unified checkpoints and savepoints • An improved Flink deployment and process model • Fine-grained recovery from task failures • An SQL Client to execute SQL queries against batch and streaming tables. • Serving of machine learning models. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 40
  • 42. CLICKSTREAM PROCESSING AT THE FINANCIAL TIMES The Financial Times (FT) process millions of customer events per day. The ability to monitor such events in real- time is crucial for attracting new customers, monitoring the popularity of articles and personalising experiences. In this talk, the Flink team, will show us: • how they use Flink to process their clickstream; • how they operate the pipeline using Docker Swarm in AWS; • how they keep secrets safe using Vault, and; • how they monitor it with Prometheus and Grafana. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 42
  • 43. LIGHTNING TALKS • Give back to the community! • Have an idea you want to discuss? • Have done work you want to talk about? • Found out about a new concept and what to present it? Come, do a lightning talk! 15 mins of pure excitement & passion! 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 43
  • 44. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK

Editor's Notes

  1. We started of in 2016 We are excited about its potential, and we want to find other people who are interested. Apache Flink is a 'streaming first' data processing engine
  2. … active participation is something that we want to change in the future (we will discuss this further around the end of this presentation)
  3. https://dzone.com/articles/streaming-in-spark-flink-and-kafka-1
  4. Lets break this down
  5. Lets break this down
  6. This includes parallel processing in which a single computer uses more than one CPU to execute programs.
  7. This includes parallel processing in which a single computer uses more than one CPU to execute programs.
  8. This includes parallel processing in which a single computer uses more than one CPU to execute programs.
  9. At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input. … like a Markov Chain: A Markov chain is "a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event"
  10. In contrast, operators in stateless stream processing only consider their current inputs, without further context and knowledge about the past. A simple example to illustrate this difference: let us consider a source stream that emits events with schema e = {event_id:int, event_value:int}. Our goal is, for each event, to extract and output the event_value. We can easily achieve this with a simple source-map-sink pipeline, where the map function extracts the event_value from the event and emits it downstream to an outputting sink. This is an instance of stateless stream processing.
  11. But what if we want to modify our job to output the event_value only if it is larger than the value from the previous event? In this case, our map function obviously needs some way to remember the event_value from a past event — and so this is an instance of stateful stream processing. This example should demonstrate that state is a fundamental, enabling concept in stream processing that is required for a majority of interesting use cases. There are of course, more complex states such as keeping a state-machine for detecting patterns for fraudulent financial transactions or holding a model for some machine learning application
  12. Any kind of data is produced as a stream of events. Credit card transactions, sensor measurements, machine logs, or user interactions on a website or mobile application, all of these data are generated as a stream. Unbounded streams have a start but no defined end. They do not terminate and provide data as it is generated. Unbounded streams must be continuously processed, i.e., events must be promptly handled after they have been ingested.  Bounded streams have a defined start and end. Bounded streams can be processed by ingesting all data before performing any computations.
  13. Flink is a layered system. The different layers of the stack build on top of each other and raise the abstraction level of the program representations they accept: The runtime layer receives a program in the form of a JobGraph. Both the DataStream API and the DataSet API generate JobGraphs through separate compilation processes. The DataSet API uses an optimizer to determine the optimal plan for the program, while the DataStream API uses a stream builder. The JobGraph is executed according to a variety of deployment options available in Flink (e.g., local, remote, YARN (resource management and job schedulling managers), etc) Libraries and APIs that are bundled with Flink generate DataSet or DataStream API programs. These are Table for queries on logical tables, FlinkML for Machine Learning, and Gelly for graph processing.
  14. https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/programming-model.html
  15. The lowest level abstraction simply offers stateful streaming. It is embedded into the DataStream API via the Process Function. It allows users freely process events from one or more streams, and use consistent fault tolerant state. In addition, users can register event time and processing time callbacks, allowing programs to realize sophisticated computations. In practice, most applications would not need the lowest level abstraction, but would instead program against the Core APIs like the DataStream API (bounded/unbounded streams) and the DataSet API (bounded data sets). These APIs offer the common building blocks for data processing, like various forms of user-specified transformations, joins, aggregations, windows, state, etc. The Table API is a declarative Domain Specific Language centered around tables One can seamlessly convert between tables and DataStream/DataSet, allowing programs to mix Table API and with the DataStream and DataSet APIs. And, at the highest level abstraction offered by Flink is SQL. This abstraction is similar to the Table API both in semantics and expressiveness, but represents programs as SQL query expressions.
  16. Conceptually a stream is a (potentially never-ending) flow of data records, and a transformation is an operation that takes one or more streams as input, and produces one or more output streams as a result. When executed, Flink programs are mapped to streaming dataflows, consisting of streams and transformation operators. Each dataflow starts with one or more sources and ends in one or more sinks. The dataflows resemble arbitrary directed acyclic graphs(DAGs). 
  17. The operator subtasks are independent of one another, and execute in different threads and possibly on different machines or containers.
  18. Aggregating events (e.g., counts, sums) works differently on streams than in batch processing. For example, it is impossible to count all elements in a stream, because streams are in general infinite (unbounded). Instead, aggregates on streams (counts, sums, etc), are scoped by windows, such as “count over the last 5 minutes”, or “sum of the last 100 elements”
  19. When referring to time in a streaming program (for example to define windows), one can refer to different notions of time: Event Time is the time when an event was created. It is usually described by a timestamp in the events, for example attached by the producing sensor, or the producing service. Flink accesses event timestamps via timestamp assigners. Ingestion time is the time when an event enters the Flink dataflow at the source operator. Processing Time is the local time at each operator that performs a time-based operation.
  20. While many operations in a dataflow simply look at one individual event at a time (for example an event parser), some operations remember information across multiple events (for example window operators). These operations are called stateful. The state of stateful operations is maintained in what can be thought of as an embedded key/value store. The state is partitioned and distributed strictly together with the streams that are read by the stateful operators. Hence, access to the key/value state is only possible on keyed streams, after a keyBy() function, and is restricted to the values associated with the current event’s key. Aligning the keys of streams and state makes sure that all state updates are local operations, guaranteeing consistency without transaction overhead. This alignment also allows Flink to redistribute the state and adjust the stream partitioning transparently.
  21. https://flink.apache.org/poweredby.html
  22. Enterprises are investing heavily in stream processing technology, according to the second annual Apache Flink® user survey data Artisans announced: the vast majority (87 percent) of organizations surveyed are planning to deploy more applications powered by Apache Flink software in 2018. Of dozens of new application types developers are building or planning to build, machine learning (64 percent) both for model scoring (34 percent) and model training (30 percent), anomaly detection/system monitoring (27 percent) and business intelligence/reporting (25 percent) are the most popular, followed by recommendation/decisioning engines (22 percent) and security/fraud detection (19 percent), to round out the top five. Most respondents (70 percent) say their team or department is growing and hiring in 2018. Nearly as many (59 percent) expect their team or departmental budget to increase. Drawing on these insights it seems like the ability to react to data in the moment is becoming a top priority among enterprises of all sizes
  23. https://www.slideshare.net/FlinkForward/flink-forward-san-francisco-2018-aris-koliopoulos-alex-garella-panta-rhei-designing-distributed-applications-with-streams
  24. https://www.slideshare.net/FlinkForward/flink-forward-san-francisco-2018-aris-koliopoulos-alex-garella-panta-rhei-designing-distributed-applications-with-streams
  25. A pattern where replayable logs, like Apache Kafka, are used for both communication as well as event storage, incorporating the retentive properties of a database in a system designed to share data across many teams, clouds and geographies.
  26. ValueState<T>: This keeps a value that can be updated and retrieved ListState<T>: This keeps a list of elements. You can append elements and retrieve an Iterable  ReducingState<T>: This keeps a single value that represents the aggregation of all values added to the state. AggregatingState<IN, OUT>: Contrary to ReducingState, the aggregate type may be different from the type of elements that are added to the state. FoldingState<T, ACC>: Same as AggregatingState but here values are folded into an aggregate using a specified FoldFunction. MapState<UK, UV>: This keeps a list of mappings.
  27. Machine Learning/Deep Learning models can be used in different ways to do predictions. My preferred way is to deploy an analytic model directly into a stream processing application (like Kafka Streams). This allows for better latency and independence of external services. However, direct deployment of models is not always a feasible approach. Sometimes it makes sense or is needed to deploy a model in another serving infrastructure like TensorFlow Serving for TensorFlow models. Model Inference is then done via Remote Procedure Ccalls/Request Response communication. Organizational or technical reasons might force this approach. Stavros said how about running model serving natively and in this case inside the FLink server. Use dynamically controlled stream approach—models are delivered to running implementation via model’s stream and dynamically instantiated for usage.  … as new events come through the live event stream, we’re able to evaluate them against the newly-added models (or rules). 
  28. https://medium.com/@mustafaakin/flink-streaming-sql-example-6076c1bc91c1
  29. DBMS run queries on streams all the time! Materialised Views (MVs) are used to speed up analytical queries They need to update when tables change MV maintenance is very similar to MVs: Table updates are a stream of statements MV definitions (queries) are evaluated (continuously) on that stream What is a materialised view? Whenever a query or an update addresses an ordinary view's virtual table, the DBMS converts these into queries or updates against the underlying base tables. A materialized view takes a different approach: the query result is cached as a concrete ("materialized") table (rather than a view as such) that may be updated from the original base tables from time to time. This enables much more efficient access, at the cost of extra storage and of some data being potentially out-of-date. Core concept is a dynamic table which change over time Queries on dynamic tables produce new dynamic tables which are updated based on input and do not terminate In the figure you can see the process of dynamic table conversion
  30. Number of clicks in the last hour
  31. Simplifying Apache Flink’s state with the addition of native support for state TTL (Time to Leave). This feature allows to clean up state after it has expired. With Flink 1.6.0 timer state can now go out of core by storing the relevant state in RocksDB. Moreover, the team improved the deletion of timers significantly. Support for resource elasticity and different deployment scenarios (such as better container integration). Flink 1.6.0 comes with HTTP/REST based external communications and job submissions as well as a container entrypoint for simplified bootstrapping of containerized job clusters. Further improvements to the Streaming SQL CLI, including simplifying the executions of streaming and batch queries against different data sources, adding full Avro support for reading easily any kind of Avro data and hardening Flink’s CEP library to handle significantly larger state sizes compared to past versions.  Improved Flink connectors allowing better integration with external systems. The additions to Flink 1.6.0 include a new StreamingFileSink that replaces the BucketingSink as the standard file sink from previous versions, support for ElasticSearch 6.x and different  AvroDeserializationSchemasto seamlessly ingest Avro data.
  32. Integration of SQL and CEP, as described in FLIP-20 to allow developers to create complex event processing (CEP) patterns using SQL statements. Unified checkpoints and savepoints, as described in FLIP-10, to allow savepoints to be triggered automatically–important for program updates for the sake of error handling because savepoints allow the user to modify both the job and Flink version whereas checkpoints can only be recovered with the same job. An improved Flink deployment and process model, as described in FLIP-6, to allow for better integration with Flink and cluster managers and deployment technologies such as Mesos, Docker, and Kubernetes. Fine-grained recovery from task failures, as described in FLIP-1 to improve recovery efficiency and only re-execute failed tasks, reducing the amount of state that Flink needs to transfer on recovery. An SQL Client, as described in FLIP-24 to add a service and a client to execute SQL queries against batch and streaming tables. Serving of machine learning models, as described in FLIP-23 to add a library that allows users to apply offline-trained machine learning models to data streams.