SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Connector Tips & Tricks
Eron Wright, Dell EMC
@eronwright
@ 2019 Dell EMC
Who am I?
● Tech Staff at Dell EMC
● Contributor to Pravega stream storage system
○ Dynamically-sharded streams
○ Event-time tracking
○ Transaction support
● Maintainer of Flink connector for Pravega
Overview
Topics:
● Connector Basics
● Table Connectors
● Event Time
● State & Fault Tolerance
Connector Basics
Developing a Connector
● Applications take an explicit dependency on a connector
○ Not generally built-in to the Flink environment
○ Treated as a normal application dependency
○ Consider shading and relocating your connector’s dependencies
● Possible connector repositories:
○ Apache Flink repository
○ Apache Bahir (for Flink) repository
○ Your own repository
Types of Flink Connectors
● Streaming Connectors
○ Provide sources and/or sinks
○ Sources may be bounded or unbounded
● Batch Connectors
○ Not discussed here
● Table Connectors
○ Provide tables which act as sources, sinks, or both
○ Unifies the batch and streaming programming model
○ Typically relies on a streaming and/or batch connector under the hood
○ A table’s update mode determines how a table is converted to/from a stream
■ Append Mode, Retract Mode, Upsert Mode
Key Challenges
● How to parallelize your data source/sink
○ Subdivide the source data amongst operator subtasks, e.g. by partition
○ Support parallelism changes
● How to provide fault tolerance
○ Provide exactly-once semantics
○ Support coarse- and fine-grained recovery for failed tasks
○ Support Flink checkpoints and savepoints
● How to support historical and real-time processing
○ Facilitate correct program output
○ Support event time semantics
● Security considerations
○ Safeguarding secrets
Connector Lifecycle
● Construction
○ Instantiated in the driver program (i.e. main method); must be serializable
○ Use the builder pattern to provide a DSL for your connector
○ Avoid making connections if possible
● State Initialization
○ Separate configuration from state
● Run
○ Supports both unbounded and bounded sources
● Cancel / Stop
○ Supports graceful termination (w/ savepoint)
○ May advance the event time clock to the end-of-time (MAX_WATERMARK)
Connector Lifecycle (con’t)
● Advanced: Initialize/Finalize on Job Master
○ Exclusively for OutputFormat (e.g.. file-based sinks)
○ Implement InitializeOnMaster, FinalizeOnMaster, and CleanupWhenUnsuccessful
○ Support for Steaming API added in Flink 1.9; see FLINK-1722
User-Defined Data Types
● Connectors are typically agnostic to the record data type
○ Expects application to supply type information w/ serializer
● For sources:
○ Accept a DeserializationSchema<T>
○ Implement ResultTypeQueryable<T>
● For sinks:
○ Accept a SerializationSchema<T>
● First-class support for Avro, Parquet, JSON
○ Geared towards Flink Table API
Connector Metrics
● Flink exposes a metric system for gathering and reporting metrics
○ Reporters: Flink UI, JMX, InfluxDB, Prometheus, ...
● Use the metric API in your connector to expose relevant metric data
○ Types: counters, gauges, histograms, meters
● Metrics are tracked on a per-subtask basis
● More information:
○ Flink Documentation / Debugging & Monitoring / Metrics
Connector Security
● Credentials are typically passed as ordinary program parameters
○ Beware lack of isolation between jobs in a given cluster
● Flink does have first-class support for Kerberos credentials
○ Based on keytabs (in support of long-running jobs)
○ Expects connector to use a named JAAS context
○ See: Kerberos Authentication Setup and Configuration
Table API
Summary
● The Table API is evolving rapidly
○ For new connectors, focus on supporting the Blink planner
● Table sources and sinks are generally built upon the DataStream API
● Two configuration styles - typed DSL and string-based properties
● Table formats are connector-independent
○ E.g. CSV, JSON, Avro
● A catalog encapsulates a collection of tables, views, and functions
○ Provides convenience and interactivity
● More information:
○ Docs: User-Defined Sources & Sinks
Event Time Support
Key Considerations
● Connectors play an critical role in program correctness
○ Connector internals influence the order-of-observation (in event time) and hence the practicality of
watermark generation
○ Connectors exhibit different behavior in historical vs real-time processing
● Event time skew leads to excess buffering and hence inefficiency
● There’s an inherent trade-off between latency and complexity
Global Watermark Tracking
● Flink 1.9 has a facility for tracking a global aggregate value across sub-tasks
○ Ideal for establishing a global minimum watermark
○ See StreamingRuntimeContext#getGlobalAggregateManager
● Most useful in highly dynamic sources
○ Compensates for impact of resharding, rebalancing on event time
○ Increases latency
● See Kinesis connector’s JobManagerWatermarkTracker
Source Idleness
● Downstream tasks depend on arrival of watermarks from all sub-tasks
○ Beware stalling the pipeline
● A sub-task may remove itself from consideration by idling
○ i.e. “release the hold on the event time clock”
● A source should be idled mainly for semantic reasons
○
Sink Watermark Propagation
● Consider the possibility of watermark propagation across jobs
○ Propagate upstream watermarks along with output records
○ Job 1 → (external system) → Job 2
● Sink function does have access to current watermark
○ But only when processing an input record 😞
● Solution: event-time timers
○ Chain a ProcessFunction and corresponding SinkFunction, or develop a custom operator
Practical Suggestions
● Provide an API to assign timestamps and to generate watermarks
○ Strive to isolate system internals, e.g. apply the watermark generator on a per-partition basis
○ Aggregate the watermarks into a per-subtask or global watermark
● Strive to minimize event time ‘skew’ across subtasks
○ Strategy: prioritize oldest data and pause ingestion of partitions that are too far ahead
○ See FLINK-10886 for improvements to Kinesis, Kafka connectors
● Remember: the goal is not a total ordering of elements (in event time)
State & Fault Tolerance
Working with State
● Sources are typically stateful, e.g.
○ partition assignment to sub-tasks
○ position tracking
● Use managed operator state to track redistributable units of work
○ List state - a list of redistributable elements (e.g. partitions w/ current position index)
○ Union list state - a variation where each sub-task gets the complete list of elements
● Various interfaces:
○ CheckpointedFunction - most powerful
○ ListCheckpointed - limited but convenient
○ CheckpointListener - to observe checkpoint completion (e.g. for 2PC)
Exactly-Once Semantics
● Definition: evolution of state is based on a single observation of a given element
● Writes to external systems are ideally idempotent
● For sinks, Flink provides a few building blocks:
○ TwoPhaseCommitSinkFunction - base class providing a transaction-like API (but not storage)
○ GenericWriteAheadSink - implements a WAL using the state backend (see: CassandraSink)
○ CheckpointCommitter - stores information about completed checkpoints
● Savepoints present various complications
○ User may opt to resume from any prior checkpoint, not just the most recent checkpoint
○ The connector may be reconfigured w/ new inputs and/or outputs
Advanced: Externally-Induced Sources
● Flink is still in control of initiating the overall checkpoint, with a twist!
● It allows a source to control the checkpoint barrier insertion point
○ E.g. based on incoming data or external coordination
● Hooks into the checkpoint coordinator on the master
○ Flink → Hook → External System → Sub-task
● See:
○ ExternallyInducedSource
○ WithMasterCheckpointHook
Thank You!
● Feedback welcome (e.g. via the FF app)
● See me at the Speaker’s Lounge

Contenu connexe

Tendances

Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...Apache Flink Taiwan User Group
 
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Flink Forward
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamVerverica
 
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward
 
Open Source Serverless: a practical view. - Gabriele Provinciali Luca Postacc...
Open Source Serverless: a practical view. - Gabriele Provinciali Luca Postacc...Open Source Serverless: a practical view. - Gabriele Provinciali Luca Postacc...
Open Source Serverless: a practical view. - Gabriele Provinciali Luca Postacc...Codemotion
 
Stream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraStream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraParis Carbone
 

Tendances (6)

Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
 
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
 
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
 
Open Source Serverless: a practical view. - Gabriele Provinciali Luca Postacc...
Open Source Serverless: a practical view. - Gabriele Provinciali Luca Postacc...Open Source Serverless: a practical view. - Gabriele Provinciali Luca Postacc...
Open Source Serverless: a practical view. - Gabriele Provinciali Luca Postacc...
 
Stream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraStream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming era
 

Similaire à Flink Connector Development Tips & Tricks

Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Timo Walther
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)KafkaZone
 
Flux architecture and Redux - theory, context and practice
Flux architecture and Redux - theory, context and practiceFlux architecture and Redux - theory, context and practice
Flux architecture and Redux - theory, context and practiceJakub Kocikowski
 
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...HostedbyConfluent
 
Timing is Everything: Understanding Event-Time Processing in Flink SQL
Timing is Everything: Understanding Event-Time Processing in Flink SQLTiming is Everything: Understanding Event-Time Processing in Flink SQL
Timing is Everything: Understanding Event-Time Processing in Flink SQLHostedbyConfluent
 
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...NETWAYS
 
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...Flink Forward
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streamingdatamantra
 
Network Statistics for OpenFlow
Network Statistics for OpenFlowNetwork Statistics for OpenFlow
Network Statistics for OpenFlowMiro Cupak
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaMushfekur Rahman
 
Serverless Event Streaming with Pulsar Functions
Serverless Event Streaming with Pulsar FunctionsServerless Event Streaming with Pulsar Functions
Serverless Event Streaming with Pulsar FunctionsStreamNative
 
Log Event Stream Processing In Flink Way
Log Event Stream Processing In Flink WayLog Event Stream Processing In Flink Way
Log Event Stream Processing In Flink WayGeorge T. C. Lai
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward
 
Coordination in distributed systems
Coordination in distributed systemsCoordination in distributed systems
Coordination in distributed systemsAndrea Monacchi
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...Flink Forward
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022HostedbyConfluent
 
Dynamic log processing with fluentd and konfigurator
Dynamic log processing with fluentd and konfiguratorDynamic log processing with fluentd and konfigurator
Dynamic log processing with fluentd and konfiguratorAhmad Iqbal Ali
 

Similaire à Flink Connector Development Tips & Tricks (20)

Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
 
Flux architecture and Redux - theory, context and practice
Flux architecture and Redux - theory, context and practiceFlux architecture and Redux - theory, context and practice
Flux architecture and Redux - theory, context and practice
 
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
Getting Data In and Out of Flink - Understanding Flink and Its Connector Ecos...
 
Timing is Everything: Understanding Event-Time Processing in Flink SQL
Timing is Everything: Understanding Event-Time Processing in Flink SQLTiming is Everything: Understanding Event-Time Processing in Flink SQL
Timing is Everything: Understanding Event-Time Processing in Flink SQL
 
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
 
Apache flink
Apache flinkApache flink
Apache flink
 
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...Running Flink in Production:  The good, The bad and The in Between - Lakshmi ...
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
 
Airflow Intro-1.pdf
Airflow Intro-1.pdfAirflow Intro-1.pdf
Airflow Intro-1.pdf
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Network Statistics for OpenFlow
Network Statistics for OpenFlowNetwork Statistics for OpenFlow
Network Statistics for OpenFlow
 
Airflow 101
Airflow 101Airflow 101
Airflow 101
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 
Serverless Event Streaming with Pulsar Functions
Serverless Event Streaming with Pulsar FunctionsServerless Event Streaming with Pulsar Functions
Serverless Event Streaming with Pulsar Functions
 
Log Event Stream Processing In Flink Way
Log Event Stream Processing In Flink WayLog Event Stream Processing In Flink Way
Log Event Stream Processing In Flink Way
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
 
Coordination in distributed systems
Coordination in distributed systemsCoordination in distributed systems
Coordination in distributed systems
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022
 
Dynamic log processing with fluentd and konfigurator
Dynamic log processing with fluentd and konfiguratorDynamic log processing with fluentd and konfigurator
Dynamic log processing with fluentd and konfigurator
 

Dernier

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 

Dernier (20)

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 

Flink Connector Development Tips & Tricks

  • 1. Connector Tips & Tricks Eron Wright, Dell EMC @eronwright @ 2019 Dell EMC
  • 2. Who am I? ● Tech Staff at Dell EMC ● Contributor to Pravega stream storage system ○ Dynamically-sharded streams ○ Event-time tracking ○ Transaction support ● Maintainer of Flink connector for Pravega
  • 3. Overview Topics: ● Connector Basics ● Table Connectors ● Event Time ● State & Fault Tolerance
  • 5. Developing a Connector ● Applications take an explicit dependency on a connector ○ Not generally built-in to the Flink environment ○ Treated as a normal application dependency ○ Consider shading and relocating your connector’s dependencies ● Possible connector repositories: ○ Apache Flink repository ○ Apache Bahir (for Flink) repository ○ Your own repository
  • 6. Types of Flink Connectors ● Streaming Connectors ○ Provide sources and/or sinks ○ Sources may be bounded or unbounded ● Batch Connectors ○ Not discussed here ● Table Connectors ○ Provide tables which act as sources, sinks, or both ○ Unifies the batch and streaming programming model ○ Typically relies on a streaming and/or batch connector under the hood ○ A table’s update mode determines how a table is converted to/from a stream ■ Append Mode, Retract Mode, Upsert Mode
  • 7. Key Challenges ● How to parallelize your data source/sink ○ Subdivide the source data amongst operator subtasks, e.g. by partition ○ Support parallelism changes ● How to provide fault tolerance ○ Provide exactly-once semantics ○ Support coarse- and fine-grained recovery for failed tasks ○ Support Flink checkpoints and savepoints ● How to support historical and real-time processing ○ Facilitate correct program output ○ Support event time semantics ● Security considerations ○ Safeguarding secrets
  • 8. Connector Lifecycle ● Construction ○ Instantiated in the driver program (i.e. main method); must be serializable ○ Use the builder pattern to provide a DSL for your connector ○ Avoid making connections if possible ● State Initialization ○ Separate configuration from state ● Run ○ Supports both unbounded and bounded sources ● Cancel / Stop ○ Supports graceful termination (w/ savepoint) ○ May advance the event time clock to the end-of-time (MAX_WATERMARK)
  • 9. Connector Lifecycle (con’t) ● Advanced: Initialize/Finalize on Job Master ○ Exclusively for OutputFormat (e.g.. file-based sinks) ○ Implement InitializeOnMaster, FinalizeOnMaster, and CleanupWhenUnsuccessful ○ Support for Steaming API added in Flink 1.9; see FLINK-1722
  • 10. User-Defined Data Types ● Connectors are typically agnostic to the record data type ○ Expects application to supply type information w/ serializer ● For sources: ○ Accept a DeserializationSchema<T> ○ Implement ResultTypeQueryable<T> ● For sinks: ○ Accept a SerializationSchema<T> ● First-class support for Avro, Parquet, JSON ○ Geared towards Flink Table API
  • 11. Connector Metrics ● Flink exposes a metric system for gathering and reporting metrics ○ Reporters: Flink UI, JMX, InfluxDB, Prometheus, ... ● Use the metric API in your connector to expose relevant metric data ○ Types: counters, gauges, histograms, meters ● Metrics are tracked on a per-subtask basis ● More information: ○ Flink Documentation / Debugging & Monitoring / Metrics
  • 12. Connector Security ● Credentials are typically passed as ordinary program parameters ○ Beware lack of isolation between jobs in a given cluster ● Flink does have first-class support for Kerberos credentials ○ Based on keytabs (in support of long-running jobs) ○ Expects connector to use a named JAAS context ○ See: Kerberos Authentication Setup and Configuration
  • 14. Summary ● The Table API is evolving rapidly ○ For new connectors, focus on supporting the Blink planner ● Table sources and sinks are generally built upon the DataStream API ● Two configuration styles - typed DSL and string-based properties ● Table formats are connector-independent ○ E.g. CSV, JSON, Avro ● A catalog encapsulates a collection of tables, views, and functions ○ Provides convenience and interactivity ● More information: ○ Docs: User-Defined Sources & Sinks
  • 16. Key Considerations ● Connectors play an critical role in program correctness ○ Connector internals influence the order-of-observation (in event time) and hence the practicality of watermark generation ○ Connectors exhibit different behavior in historical vs real-time processing ● Event time skew leads to excess buffering and hence inefficiency ● There’s an inherent trade-off between latency and complexity
  • 17.
  • 18.
  • 19.
  • 20. Global Watermark Tracking ● Flink 1.9 has a facility for tracking a global aggregate value across sub-tasks ○ Ideal for establishing a global minimum watermark ○ See StreamingRuntimeContext#getGlobalAggregateManager ● Most useful in highly dynamic sources ○ Compensates for impact of resharding, rebalancing on event time ○ Increases latency ● See Kinesis connector’s JobManagerWatermarkTracker
  • 21. Source Idleness ● Downstream tasks depend on arrival of watermarks from all sub-tasks ○ Beware stalling the pipeline ● A sub-task may remove itself from consideration by idling ○ i.e. “release the hold on the event time clock” ● A source should be idled mainly for semantic reasons ○
  • 22. Sink Watermark Propagation ● Consider the possibility of watermark propagation across jobs ○ Propagate upstream watermarks along with output records ○ Job 1 → (external system) → Job 2 ● Sink function does have access to current watermark ○ But only when processing an input record 😞 ● Solution: event-time timers ○ Chain a ProcessFunction and corresponding SinkFunction, or develop a custom operator
  • 23. Practical Suggestions ● Provide an API to assign timestamps and to generate watermarks ○ Strive to isolate system internals, e.g. apply the watermark generator on a per-partition basis ○ Aggregate the watermarks into a per-subtask or global watermark ● Strive to minimize event time ‘skew’ across subtasks ○ Strategy: prioritize oldest data and pause ingestion of partitions that are too far ahead ○ See FLINK-10886 for improvements to Kinesis, Kafka connectors ● Remember: the goal is not a total ordering of elements (in event time)
  • 24. State & Fault Tolerance
  • 25. Working with State ● Sources are typically stateful, e.g. ○ partition assignment to sub-tasks ○ position tracking ● Use managed operator state to track redistributable units of work ○ List state - a list of redistributable elements (e.g. partitions w/ current position index) ○ Union list state - a variation where each sub-task gets the complete list of elements ● Various interfaces: ○ CheckpointedFunction - most powerful ○ ListCheckpointed - limited but convenient ○ CheckpointListener - to observe checkpoint completion (e.g. for 2PC)
  • 26. Exactly-Once Semantics ● Definition: evolution of state is based on a single observation of a given element ● Writes to external systems are ideally idempotent ● For sinks, Flink provides a few building blocks: ○ TwoPhaseCommitSinkFunction - base class providing a transaction-like API (but not storage) ○ GenericWriteAheadSink - implements a WAL using the state backend (see: CassandraSink) ○ CheckpointCommitter - stores information about completed checkpoints ● Savepoints present various complications ○ User may opt to resume from any prior checkpoint, not just the most recent checkpoint ○ The connector may be reconfigured w/ new inputs and/or outputs
  • 27. Advanced: Externally-Induced Sources ● Flink is still in control of initiating the overall checkpoint, with a twist! ● It allows a source to control the checkpoint barrier insertion point ○ E.g. based on incoming data or external coordination ● Hooks into the checkpoint coordinator on the master ○ Flink → Hook → External System → Sub-task ● See: ○ ExternallyInducedSource ○ WithMasterCheckpointHook
  • 28.
  • 29.
  • 30. Thank You! ● Feedback welcome (e.g. via the FF app) ● See me at the Speaker’s Lounge