http://flink-forward.org/kb_sessions/running-apache-flink-everywhere-standalone-yarn-mesos-docker-kubernetes-etc/
The world of cluster managers and deployment frameworks is getting complicated. There is zoo of tools to deploy and manage data processing jobs, all of which have different resource management and fault tolerance slightly different. Some tools have a only per-job processes (Yarn, Docker/Kubernetes), while others require some long running processes (Mesos, Standalone). In some frameworks, streaming jobs control their own resource allocation (Yarn, Mesos), while for other frameworks, resource management is handled by external tools (Kubernetes). To be broadly usable in a variety of setups, Flink needs to play well with all these frameworks and their paradigms. This talk describes Flink’s new proposed process and deployment model that will make it work together well with the above mentioned frameworks. The new abstraction is designed to cover a variety of use cases, like isolated single job deployments, sessions of multiple short jobs, and multi-tenant setups.
2. How is Flink deployed?
2
Standalone Cluster Embedded Service (OSGI)
YARN Sessions
Standalone Cloud
Docker on Mesos
Docker/Kubernetes
YARN->Myriad->Mesos
YARN Jobs
A two minute search on the mailing list reveals
3. How is Flink deployed?
3
Standalone Cluster Embedded Service (OSGI)
YARN Sessions
Standalone Cloud
Docker on Mesos
Docker/Kubernetes
YARN->Myriad->Mesos
YARN Jobs
A two minute search on the mailing list reveals
Mesos Sessions
Mesos Jobs
(soon!)
4. How is Flink deployed?
4
Standalone Cluster Embedded Service (OSGI)
YARN Sessions
Standalone Cloud
Docker on Mesos
Docker/Kubernetes
YARN->Myriad->Mesos
YARN Jobs
Users run mostly isolated jobs or multi-job sessions
Mesos Sessions
Mesos Jobs
5. Resource Management
5
Standalone Cluster Embedded Service (OSGI)
YARN Sessions
Standalone Cloud
Docker on Mesos
Docker/Kubernetes
YARN->Myriad->Mesos
YARN Jobs
Resources controlled by the framework or another service.
Mesos Sessions
Mesos Jobs
6. More dimensions coming up…
6
Dynamic Resources
• Number of TaskManagers changes
over job lifetime
"Trusted" processes
• Run under superuser credential
and dispatch jobs
No blocking on any process type
• YARN job needs to continue while
ApplicationMaster is down
Uniform vs. Heterogeneous Resources
• Run different functions in different
size containers
• E.g., simple mapper in small
container, heavy window operator in
large container Avoiding "Job Submit" step
8. Flink Improvement Proposal 6
8
Currently driving parties:
Core Idea
• Creating composable building blocks
• Create different compositions for different
scenarios
FLIP-6 design document:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
10. Recap: Current status (YARN)
10
YARN
ResourceManager
YARN Cluster
Client
(1) Submit YARN App.
(FLINK)
Application Master
JobManager
TaskManager
TaskManager
TaskManager
(2) Spawn AppMaster
(4) Start
TaskManagers
(8) Deploy
Tasks
(3) Poll
status
(6) All
TaskManager
started
(5) Register
(7) Submit Job
11. The Building Blocks
11
• ClusterManager-specific
• May live across jobs
• Manages available Containers/TaskManagers
• Used to acquire / release resources
ResourceManager
TaskManagerJobManager
• Registers at ResourceManager
• Gets tasks from one or more
JobManagers
• Single job only, started per job
• Thinks in terms of "task slots"
• Deploys and monitors job/task execution
Dispatcher
• Lives across jobs
• Touch-point for job submissions
• Spawns JobManagers
• May spawn ResourceManager
14. Building Flink-on-YARN
Main differences from current YARN mode
All containers started with JARs, config files in classpath
Credentials & Secrets are strictly bound to a single job
Slots are allocated/released as needed/freed
• Basic building block for elastic resource usage
Client disconnects after submitting job, does not need to wait until
TaskManagers are up
14
17. Building Flink-on-Mesos
17
Mesos Master
Mesos Cluster
Mesos Cluster
Client
(1) HTTP POST
JobGraph/Jars
Flink Master Process
Flink Mesos
ResourceManager
JobManager TaskManager
TaskManager
TaskManager
(3) Start Process
(and supervise)
(5) Start
TaskManagers
(7) Deploy
Tasks
(6) Register
(4) Request slots
Flink Mesos
Dispatcher
(2) Allocate container
for Flink master
18. Building Standalone
18
Standalone Cluster
Flink Cluster
Client
(1) Submit
JobGraph/Jars
Flink Master Process
Standalone
ResourceManager
TaskManager
TaskManager
TaskManager
(7) Deploy Tasks
(1) Register
(3) Request slots
JobManager JobManager
Dispatcher
(2) Start JobMngr
Standby Master Process Standby Master Process
19. Master Container
Flink Master Process
Building Flink-on-Docker/K8S
19
Flink-Container
ResourceManager
JobManager
Program Runner
(2) Run & Start
Worker Container
TaskManager
Worker Container
TaskManager
Worker Container
TaskManager
(3) Register
(1) Container framework starts Master & Worker Containers
(4) Deploy Tasks
20. Building Flink-on-Docker/K8S
This is a blueprint for all setups where external services control
resources and start new TaskManagers
• For example AWS EC2 Flink image with auto-scaling groups
Can be extended to have N equal containers, out of which one
becomes master, remainder workers
With upcoming dynamic-scaling feature (see Till's talk), JobManager
scales job to use all available resources
20
23. Sessions vs. Jobs
For each Job submitted, the session will spawn its own JobManager
All jobs run under session-user credentials
ResourceManager holds on to containers for a certain time
• Jobs quickly following one another reuse containers (quicker response)
Internally, sessions build on the dispatcher component
23
25. More stuff
Dynamically acquire/release resources
• Slots are allocated/released from Resource Manager as needed
• ResourceManager allocates/releases containers over time
• Strong interplay with "Dynamic Scaling" (rf. talk by Till yesterday)
Resource Profiles: Containers of different size
• Requests can pass a "profile" (CPU / memory / disk), or simply use
"default profile"
• Resource Managers YARN & Mesos can allocate respective containers
25
26. Wrapping it up
It’s a zoo of cluster managers out there
• Following different paradigms
Usage patterns vary because of Flink's broad use cases
• Isolated long running jobs vs. many short-lived jobs
• Shared clusters vs. per-user authenticated resources
We are making "jobs" and "sessions" explicit constructs
Flexible building blocks, composed in various ways to accommodate
different scenarios
26
28. Flink Streaming cornerstones
28
Low latency
High Throughput
Well-behaved
flow control
(back pressure)
Make more sense of data
Works on real-time
and historic data
Performant
Streaming
Event Time
APIs
Libraries
Stateful
Streaming
Globally consistent
savepoints
Exactly-once semantics
for fault tolerance
Windows &
user-defined state
Flexible windows
(time, count, session, roll-your own)
Complex Event Processing