This document discusses Apache Flink and its integration with Apache Mesos and DC/OS. It provides an overview of Flink's stream processing capabilities and common use cases. It then describes how Flink runs on Mesos using Fenzo for resource allocation and task scheduling. A demo is shown of a financial data pipeline using Kafka for ingestion and output, and Flink for real-time stream processing on Mesos via DC/OS. In conclusion, Flink is a modern stream processor that can run on Mesos through its integration, and DC/OS provides an easy-to-use package for deploying Flink.
6. FMACK Stack
EVENTS
Ubiquitous data
streams from
connected devices
INGEST
Apache
Kafka
STORE
Apache
Flink
ANALYZE
Apache
Cassandra
ACT
Akka
Ingest millions of
events per second
Distributed & highly
scalable database
Real-time and batch
process data
Visualize data & build
data driven apps
Mesos/ DC/OS
Sensors
Devices
Clients
16. Detecting fraud in real time
As fraudsters get better,
need to update models
without downtime
Live 24/7 service
Credit card
transactions
Notifications
and alerts
Evolving fraud
models built by
data scientists
@
17. ▪ Athena X (https://eng.uber.com/athenax/)
▪ Streaming analytics platform
▪ SQL as abstraction layer
Streams from
Hadoop,
Kafka, etc
SQL, thresholds,
actions
Analytics
Alerts
Derived
streams
@
18. ▪ Blink based on Flink
▪ A core system in Alibaba Search
• Machine learning, search, recommendations
• A/B testing of search algorithms
• Online feature updates to boost conversion rate
@
19. @
Complete social network
Implemented using event sourcing and
CQRS (Command Query Responsibility
Segregation)
https://data-artisans.com/blog/drivetribe-cqrs-apache-flink
21. Why Apache Mesos?
▪ Mesos offers full functionality to implement fault
tolerant and elastic distributed applications
▪ 30% of survey respondents were running Flink
on Mesos (prior to proper Mesos support,
September 2016)
26. DC/OS
Datacenter Operating System (DC/OS)
Distributed Systems Kernel (Mesos)
Big Data + Analytics EnginesMicroservices (in containers)
Streaming
Batch
Machine Learning
Analytics
Functions &
Logic
Search
Time Series
SQL / NoSQL
Databases
Modern App Components
Any Infrastructure (Physical, Virtual, Cloud)
27. Demo Time
Generator
▪ Financial data generated by generator
▪ Written to Kafka topics
▪ Kafka topics consumed by Flink
▪ Flink pipeline operates on Kafka data
▪ Results written back into Kafka
29. TL;DL
▪ Apache Flink modern stream processor for
real-time processing and event-driven
applications
▪ Apache Flink runs on Mesos using Fenzo
▪ DC/OS offers easy to use Flink package