Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Flink Meetup Septmeber 2017 2018

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Infochimps Cloudcon 2012
Infochimps Cloudcon 2012
Chargement dans…3
×

Consultez-les par la suite

1 sur 44 Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Flink Meetup Septmeber 2017 2018 (20)

Publicité

Plus récents (20)

Flink Meetup Septmeber 2017 2018

  1. 1. 2017-18 Apache Flink Meetup Year Review
  2. 2. CONTENTS • The Apache Flink Meetup Community • What is Apache Flink? • The Dataflow Programming Model • Who is using Apache Flink? • Last Year’s Talks (2017-2018) • What’s new with Flink v.1.6.0 & What’s in store? • Upcoming Meetups & more 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK
  3. 3. THE APACHE FLINK MEETUP COMMUNITY
  4. 4. THE APACHE FLINK MEETUP COMMUNITY 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 4 • … around since 2016 • … a group of enthusiasts, excited about Flink’s potential • … since then we have successfully run 17 meetups • … sponsors: Data Reply UK • … size of the community?
  5. 5. THE APACHE FLINK MEETUP COMMUNITY 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 5 • 500+ members! • ~steady growth rate • volatile active participation
  6. 6. WHAT IS APACHE FLINK?
  7. 7. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 7 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams
  8. 8. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 8 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams • … provides a standardised way to build and deploy applications.
  9. 9. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 9 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams • … a computer system (cluster) that uses more than one computers to run an application.
  10. 10. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 10 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams • … this won’t be a single sentence!
  11. 11. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 11 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams • … this won’t be a single sentence!
  12. 12. STATEFUL VS STATELESS COMPUTATIONS 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 12 State in stream processing is as memory in operators: • remembers information about past input; • can be used to influence the processing of future input; • … quite like a Markov Chain
  13. 13. STATEFUL VS STATELESS COMPUTATIONS 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 13 Stateless Example: • Consider a source stream that emits events with schema: e = {event_id:int, event_value:int} • Our goal is, for each event, to extract and output the event_value.
  14. 14. STATEFUL VS STATELESS COMPUTATIONS 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 14 Stateless Example: • Consider a source stream that emits events with schema: e = {event_id:int, event_value:int} • Our goal is, output the event_value only if it is larger than the value from the previous event. State
  15. 15. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 15 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams • … memory in operators
  16. 16. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 16 Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams
  17. 17. WHAT IS APACHE FLINK? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 17 Flink core is a streaming data flow engine that provides: • data distribution, • communication, and; • fault tolerance; for distributed computations over data streams
  18. 18. THE DATAFLOW PROGRAMMING MODEL
  19. 19. LEVELS OF ABSTRACTION 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 19 • Flink offers different levels of abstraction to develop streaming/batch applications.
  20. 20. PROGRAMS & DATAFLOWS The basic building blocks of Flink programs are: • streams and; • transformations. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 20
  21. 21. PARALLEL DATAFLOWS • Programs in Flink are inherently parallel and distributed. • During execution, a stream has one or more stream partitions, and each operator has one or more operator subtasks. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 21
  22. 22. WINDOWS 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 22 • Aggregating events (e.g., counts, sums) works differently on streams than in batch processing. • Data is not bounded so we need windows. • Windows can be time driven (example: every 30 seconds) or data driven (example: every 100 elements). Types of windows: • tumbling windows (no overlap), • sliding windows (with overlap),and; • session windows (punctuated by a gap of inactivity).
  23. 23. TIME 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 23 Different notions of time: • Event Time • Ingestion Time • Processing Time
  24. 24. STATEFUL OPERATIONS 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 24 • Some operations in a dataflow simply look at one individual event at a time. • Others operations remember information across multiple events (for example window operators). These operations are called stateful. • The state of stateful operations is maintained in what can be thought of as an embedded key/value store.
  25. 25. WHO IS USING APACHE FLINK?
  26. 26. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 26 • Alibaba, the world's largest retailer, uses a fork of Flink called Blink to optimize search rankings in real time. • Ebay's monitoring platform is powered by Flink and evaluates thousands of customizable alert rules on metrics and log streams. • Huawei is a leading global provider of ICT infrastructure and smart devices. Huawei Cloud provides Cloud Service based on Flink. • Uber built their internal SQL-based, open-source streaming analytics platform AthenaX on Apache Flink.
  27. 27. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 27 Apache Flink® user survey by dataArtisans • Enterprises are investing heavily in stream processing technology • 87% planning to deploy more applications powered by Apache Flink software in 2018 • 64% Machine Learning • 34% Model Scoring • 30% Model Training • 27% Anomaly Detection/System Monitoring • 25% Business Intelligence “… the ability to react to data in the moment is becoming a top priority among enterprises of all sizes”
  28. 28. LAST YEAR’S TALKS (2017-2018)
  29. 29. LAST YEAR’S TALKS (2017-18) • Aris Koliopoulos & Alex Garella – “Panta Rhei: designing distributed applications with streams.” • Patrick Lucas, giving a lightning talk on “Best practices around Flink state types (List/Map/ValueState etc).” • Stavros Kontopoulos with “Let’s talk ML on Flink” • Stephan Ewen (CTO & CO-Founder of Data Artisans), presenting “Stream SQL and Realtime Applications with Apache Flink” 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 29
  30. 30. PANTA RHEI: DESIGNING DISTRIBUTED APPLICATIONS WITH STREAMS ARIS KOLIOPOULOS & ALEX GARELLA • DriveTribe: a digital automotive community platform founded by, and featuring content from The Grand Tour presenters • Users consume feeds and interact with a variety of content: videos, images, articles • Problem: They wanted a scalable way to produced personalised rankings of articles for users. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 30
  31. 31. PANTA RHEI: DESIGNING DISTRIBUTED APPLICATIONS WITH STREAMS ARIS KOLIOPOULOS & ALEX GARELLA What they tried: 1. Stored data into a DB store and computed the aggregate stores on the fly o Was very slow (high read time) and didn’t scale. 2. Tried computing aggregations at write time with the intention of reducing read time:  1 write can fetch all views at once o Not fault tolerant; one read fails they all fail. o What about state mutations on the read data? 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 31
  32. 32. PANTA RHEI: DESIGNING DISTRIBUTED APPLICATIONS WITH STREAMS ARIS KOLIOPOULOS & ALEX GARELLA Solution: Treat event streams as source of truth for applications—a powerful alternative to using RPCs, Enterprise Messaging or a Shared Database to communicate and share data across different applications or microservices 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 32 1. Clients send events to the API (John liked Jeremy’s post) 2. Events are immutable; they capture a certain action at some point in time 3. Every application state instance can be modelled as a projection of those events
  33. 33. BEST PRACTICES AROUND FLINK STATE TYPES (LIST/MAP/VALUESTATE) PATRICK LUCAS Different types of Managed States: • ValueState<T> • ListState<T> • ReducingState<T> • AggregatingState<IN, OUT> • FoldingState<T, ACC> • MapState<UK, UV> 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 33 • “The cost of very frequent updates (serialization/deserialisation)” … illustrated how we can use of transient variables to do that. • “When to use ReduceState vs AggregatingState vs FoldingState?” Also, discussed the beta version of Queryable State.
  34. 34. LET’S TALK ML ON FLINK STAVROS KONTOPOULOS • How about running model serving “natively”, inside Flink server? • How? Use dynamically controlled stream approach—models are delivered to running implementation via model’s stream and dynamically instantiated for usage. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 34 Proposition: Build a streaming system allowing to update models without interruption of execution
  35. 35. STREAM SQL AND REALTIME APPLICATIONS WITH APACHE FLINK STEPHAN EWEN (CTO & CO-FOUNDER OF DATA ARTISANS) 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 35 SQL was not designed for streams: • Relations are bounded (multi-)sets while streams are infinite sequences • DBMS can access all data while streaming data arrives over time • SQL queries return a result and end while streaming queries continuously emit results and never end
  36. 36. STREAM SQL AND REALTIME APPLICATIONS WITH APACHE FLINK STEPHAN EWEN (CTO & CO-FOUNDER OF DATA ARTISANS) 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 36 DBMS run queries on streams all the time! • Materialised Views (MVs) are used to speed up analytical queries • They need to update when tables change • MV maintenance is very similar to MVs: • Table updates are a stream of statements • MV definitions (queries) are evaluated (continuously) on that stream
  37. 37. STREAM SQL AND REALTIME APPLICATIONS WITH APACHE FLINK STEPHAN EWEN (CTO & CO-FOUNDER OF DATA ARTISANS) 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 37 What about windows?
  38. 38. WHAT’S NEW WITH FLINK V.1.6.0 & WHAT’S IN STORE?
  39. 39. WHAT’S NEW WITH FLINK V.1.6.0 • Simplifying Apache Flink’s state with the addition of native support for state TTL. • Further improvements to the Streaming SQL CLI, including simplifying the executions of streaming and batch queries against different data sources • Improved Flink connectors allowing better integration with external systems. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 39
  40. 40. WHAT’S IN STORE? • Integration of SQL and CEP • Unified checkpoints and savepoints • An improved Flink deployment and process model • Fine-grained recovery from task failures • An SQL Client to execute SQL queries against batch and streaming tables. • Serving of machine learning models. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 40
  41. 41. UPCOMING MEETUPS & MORE
  42. 42. CLICKSTREAM PROCESSING AT THE FINANCIAL TIMES The Financial Times (FT) process millions of customer events per day. The ability to monitor such events in real- time is crucial for attracting new customers, monitoring the popularity of articles and personalising experiences. In this talk, the Flink team, will show us: • how they use Flink to process their clickstream; • how they operate the pipeline using Docker Swarm in AWS; • how they keep secrets safe using Vault, and; • how they monitor it with Prometheus and Grafana. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 42
  43. 43. LIGHTNING TALKS • Give back to the community! • Have an idea you want to discuss? • Have done work you want to talk about? • Found out about a new concept and what to present it? Come, do a lightning talk! 15 mins of pure excitement & passion! 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK 43
  44. 44. 10/9/18Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK Dr. Christos Hadjinikolis | Senior ML Engineer | Data Reply UK

Notes de l'éditeur

  • We started of in 2016
    We are excited about its potential, and we want to find other people who are interested. Apache Flink is a 'streaming first' data processing engine
  • … active participation is something that we want to change in the future (we will discuss this further around the end of this presentation)
  • https://dzone.com/articles/streaming-in-spark-flink-and-kafka-1
  • Lets break this down
  • Lets break this down
  • This includes parallel processing in which a single computer uses more than one CPU to execute programs.
  • This includes parallel processing in which a single computer uses more than one CPU to execute programs.
  • This includes parallel processing in which a single computer uses more than one CPU to execute programs.
  • At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input.
    … like a Markov Chain: A Markov chain is "a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event"
  • In contrast, operators in stateless stream processing only consider their current inputs, without further context and knowledge about the past.
    A simple example to illustrate this difference: let us consider a source stream that emits events with schema e = {event_id:int, event_value:int}.
    Our goal is, for each event, to extract and output the event_value.
    We can easily achieve this with a simple source-map-sink pipeline, where the map function extracts the event_value from the event and emits it downstream to an outputting sink.
    This is an instance of stateless stream processing.
  • But what if we want to modify our job to output the event_value only if it is larger than the value from the previous event?
    In this case, our map function obviously needs some way to remember the event_value from a past event — and so this is an instance of stateful stream processing.
    This example should demonstrate that state is a fundamental, enabling concept in stream processing that is required for a majority of interesting use cases.
    There are of course, more complex states such as keeping a state-machine for detecting patterns for fraudulent financial transactions or holding a model for some machine learning application

  • Any kind of data is produced as a stream of events. Credit card transactions, sensor measurements, machine logs, or user interactions on a website or mobile application, all of these data are generated as a stream.
    Unbounded streams have a start but no defined end. They do not terminate and provide data as it is generated. Unbounded streams must be continuously processed, i.e., events must be promptly handled after they have been ingested. 
    Bounded streams have a defined start and end. Bounded streams can be processed by ingesting all data before performing any computations.
  • Flink is a layered system. The different layers of the stack build on top of each other and raise the abstraction level of the program representations they accept:
    The runtime layer receives a program in the form of a JobGraph.

    Both the DataStream API and the DataSet API generate JobGraphs through separate compilation processes. The DataSet API uses an optimizer to determine the optimal plan for the program, while the DataStream API uses a stream builder.
    The JobGraph is executed according to a variety of deployment options available in Flink (e.g., local, remote, YARN (resource management and job schedulling managers), etc)
    Libraries and APIs that are bundled with Flink generate DataSet or DataStream API programs. These are Table for queries on logical tables, FlinkML for Machine Learning, and Gelly for graph processing.
  • https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/programming-model.html
  • The lowest level abstraction simply offers stateful streaming. It is embedded into the DataStream API via the Process Function. It allows users freely process events from one or more streams, and use consistent fault tolerant state. In addition, users can register event time and processing time callbacks, allowing programs to realize sophisticated computations.
    In practice, most applications would not need the lowest level abstraction, but would instead program against the Core APIs like the DataStream API (bounded/unbounded streams) and the DataSet API (bounded data sets).
    These APIs offer the common building blocks for data processing, like various forms of user-specified transformations, joins, aggregations, windows, state, etc.
    The Table API is a declarative Domain Specific Language centered around tables
    One can seamlessly convert between tables and DataStream/DataSet, allowing programs to mix Table API and with the DataStream and DataSet APIs.
    And, at the highest level abstraction offered by Flink is SQL. This abstraction is similar to the Table API both in semantics and expressiveness, but represents programs as SQL query expressions.
  • Conceptually a stream is a (potentially never-ending) flow of data records, and a transformation is an operation that takes one or more streams as input, and produces one or more output streams as a result.
    When executed, Flink programs are mapped to streaming dataflows, consisting of streams and transformation operators. Each dataflow starts with one or more sources and ends in one or more sinks. The dataflows resemble arbitrary directed acyclic graphs(DAGs). 
  • The operator subtasks are independent of one another, and execute in different threads and possibly on different machines or containers.
  • Aggregating events (e.g., counts, sums) works differently on streams than in batch processing. For example, it is impossible to count all elements in a stream, because streams are in general infinite (unbounded). Instead, aggregates on streams (counts, sums, etc), are scoped by windows, such as “count over the last 5 minutes”, or “sum of the last 100 elements”
  • When referring to time in a streaming program (for example to define windows), one can refer to different notions of time:
    Event Time is the time when an event was created. It is usually described by a timestamp in the events, for example attached by the producing sensor, or the producing service. Flink accesses event timestamps via timestamp assigners.
    Ingestion time is the time when an event enters the Flink dataflow at the source operator.
    Processing Time is the local time at each operator that performs a time-based operation.
  • While many operations in a dataflow simply look at one individual event at a time (for example an event parser), some operations remember information across multiple events (for example window operators). These operations are called stateful.

    The state of stateful operations is maintained in what can be thought of as an embedded key/value store. The state is partitioned and distributed strictly together with the streams that are read by the stateful operators. Hence, access to the key/value state is only possible on keyed streams, after a keyBy() function, and is restricted to the values associated with the current event’s key. Aligning the keys of streams and state makes sure that all state updates are local operations, guaranteeing consistency without transaction overhead. This alignment also allows Flink to redistribute the state and adjust the stream partitioning transparently.
  • https://flink.apache.org/poweredby.html
  • Enterprises are investing heavily in stream processing technology, according to the second annual Apache Flink® user survey data Artisans announced: the vast majority (87 percent) of organizations surveyed are planning to deploy more applications powered by Apache Flink software in 2018. Of dozens of new application types developers are building or planning to build, machine learning (64 percent) both for model scoring (34 percent) and model training (30 percent), anomaly detection/system monitoring (27 percent) and business intelligence/reporting (25 percent) are the most popular, followed by recommendation/decisioning engines (22 percent) and security/fraud detection (19 percent), to round out the top five.

    Most respondents (70 percent) say their team or department is growing and hiring in 2018. Nearly as many (59 percent) expect their team or departmental budget to increase.

    Drawing on these insights it seems like the ability to react to data in the moment is becoming a top priority among enterprises of all sizes
  • https://www.slideshare.net/FlinkForward/flink-forward-san-francisco-2018-aris-koliopoulos-alex-garella-panta-rhei-designing-distributed-applications-with-streams
  • https://www.slideshare.net/FlinkForward/flink-forward-san-francisco-2018-aris-koliopoulos-alex-garella-panta-rhei-designing-distributed-applications-with-streams

  • A pattern where replayable logs, like Apache Kafka, are used for both communication as well as event storage, incorporating the retentive properties of a database in a system designed to share data across many teams, clouds and geographies.
  • ValueState<T>: This keeps a value that can be updated and retrieved
    ListState<T>: This keeps a list of elements. You can append elements and retrieve an Iterable 
    ReducingState<T>: This keeps a single value that represents the aggregation of all values added to the state.
    AggregatingState<IN, OUT>: Contrary to ReducingState, the aggregate type may be different from the type of elements that are added to the state.
    FoldingState<T, ACC>: Same as AggregatingState but here values are folded into an aggregate using a specified FoldFunction.
    MapState<UK, UV>: This keeps a list of mappings.
  • Machine Learning/Deep Learning models can be used in different ways to do predictions. My preferred way is to deploy an analytic model directly into a stream processing application (like Kafka Streams). This allows for better latency and independence of external services.
    However, direct deployment of models is not always a feasible approach. Sometimes it makes sense or is needed to deploy a model in another serving infrastructure like TensorFlow Serving for TensorFlow models.
    Model Inference is then done via Remote Procedure Ccalls/Request Response communication.

    Organizational or technical reasons might force this approach.


    Stavros said how about running model serving natively and in this case inside the FLink server.

    Use dynamically controlled stream approach—models are delivered to running implementation via model’s stream and dynamically instantiated for usage. 

    … as new events come through the live event stream, we’re able to evaluate them against the newly-added models (or rules). 
  • https://medium.com/@mustafaakin/flink-streaming-sql-example-6076c1bc91c1
  • DBMS run queries on streams all the time!
    Materialised Views (MVs) are used to speed up analytical queries
    They need to update when tables change
    MV maintenance is very similar to MVs:
    Table updates are a stream of statements
    MV definitions (queries) are evaluated (continuously) on that stream


    What is a materialised view?

    Whenever a query or an update addresses an ordinary view's virtual table, the DBMS converts these into queries or updates against the underlying base tables. A materialized view takes a different approach: the query result is cached as a concrete ("materialized") table (rather than a view as such) that may be updated from the original base tables from time to time. This enables much more efficient access, at the cost of extra storage and of some data being potentially out-of-date.

    Core concept is a dynamic table which change over time
    Queries on dynamic tables produce new dynamic tables which are updated based on input and do not terminate
    In the figure you can see the process of dynamic table conversion



  • Number of clicks in the last hour
  • Simplifying Apache Flink’s state with the addition of native support for state TTL (Time to Leave). This feature allows to clean up state after it has expired. With Flink 1.6.0 timer state can now go out of core by storing the relevant state in RocksDB. Moreover, the team improved the deletion of timers significantly.
    Support for resource elasticity and different deployment scenarios (such as better container integration). Flink 1.6.0 comes with HTTP/REST based external communications and job submissions as well as a container entrypoint for simplified bootstrapping of containerized job clusters.
    Further improvements to the Streaming SQL CLI, including simplifying the executions of streaming and batch queries against different data sources, adding full Avro support for reading easily any kind of Avro data and hardening Flink’s CEP library to handle significantly larger state sizes compared to past versions. 
    Improved Flink connectors allowing better integration with external systems. The additions to Flink 1.6.0 include a new StreamingFileSink that replaces the BucketingSink as the standard file sink from previous versions, support for ElasticSearch 6.x and different  AvroDeserializationSchemasto seamlessly ingest Avro data.
  • Integration of SQL and CEP, as described in FLIP-20 to allow developers to create complex event processing (CEP) patterns using SQL statements.
    Unified checkpoints and savepoints, as described in FLIP-10, to allow savepoints to be triggered automatically–important for program updates for the sake of error handling because savepoints allow the user to modify both the job and Flink version whereas checkpoints can only be recovered with the same job.
    An improved Flink deployment and process model, as described in FLIP-6, to allow for better integration with Flink and cluster managers and deployment technologies such as Mesos, Docker, and Kubernetes.
    Fine-grained recovery from task failures, as described in FLIP-1 to improve recovery efficiency and only re-execute failed tasks, reducing the amount of state that Flink needs to transfer on recovery.
    An SQL Client, as described in FLIP-24 to add a service and a client to execute SQL queries against batch and streaming tables.
    Serving of machine learning models, as described in FLIP-23 to add a library that allows users to apply offline-trained machine learning models to data streams.

×