Adventures in Scaling from Zero to 5 Billion Data Points per Day
At Flink Forward San Francisco 2018 our team at Comcast presented the operationalized streaming ML framework which had just gone into production. This year in just a few short months we scaled a Customer Experience use case from an initial trickle of volume to processing over 5 Billion data points per day. This use case is used to help diagnose potential issues with High Speed Data service and provide recommendations to solving this issues as quickly and as cost-effectively as possible.
As with any solution that grows quickly, our platform faced challenges, bottlenecks, and technology limits; forcing us to quickly adapt and evolve our approach to enable handling 50,000+ data points per second.
We will introduce the problems, approaches, solutions, and lessons we learned along the way including: The Trigger and Diagnosis Problem, The REST problem, The “Feature Store” Problem, The “Customer State” Problem, The Savepoint Problem, The HA Problem, The Volume Problem, and of course The Really High Volume Feature Store Problem #2.