Soyez le premier à aimer ceci
Scaling a real-time streaming warehouse with Apache Flink, Parquet and Kubernetes
At Branch, we process more than 12 billions events per day, and store and aggregate terabytes of data daily. We use Apache Flink for processing, transforming and aggregating events, and parquet as the data storage format. This talk covers our challenges with scaling our warehouse, namely:
How did we scale our Flink-Parquet warehouse to handle 3x increase in traffic?
How do we ensure exactly once, event-time based, fault tolerant processing of events?
In this talk, we also provide an overview on deploying and scaling our streaming warehouse. We give an overview on:
How we scaled our Parquet warehouse by tuning memory
Running on Kubernetes cluster for resource management
How we migrated our streaming jobs with no disruption from Mesos to Kubernetes
Our challenges and learnings along the way