Chocolate, ice cream and games are perhaps 3 of the most popular universally understood words that can bring joy to anyone between 5-60 years of age!
InnoGames is one of the world's leading developers and providers of online games and at InnoGames we not only have all three of those things but in addition we build up a powerful data infrastructure because it's expensive to run your business blind. And being able to evaluate key performance indicators fast to make good decisions and deliver personalized and relevant content to each and every gamer is essential to be successful and it is how a customer becomes a fan.
Our data infrastructure mainly consists of a data pipeline that covers the streaming part and a data platform to perform batch processing. The latter is based on the Hadoop ecosystem using technologies such as Hive, Spark, Hue, R and more to give our data scientists a high flexibility. There were several evolutions of the data pipeline, starting with Kestrel and custom streaming applications. Later on we switched the base technologies to Apache Kafka and Apache Storm. Last year we recreated our streaming infrastructure based on Apache Flink which is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.
Because having fun is the best way to learn, after a quick introduction to Flink and the Flink ecosystem this talk will focus on real-world use cases and transports the idea of those projects to live examples. This way, the audience will be part of a Flink based experiment to internalize the experience we gained with Flink.
19. SIMILARITIES
THE FIRST IMPRESSION COUNTS
The moment the customer enters
the shop or the player plays his/her
first session is crucial
HALO EFFECT
When one trait of a person or
thing is used to make an overall
judgment of that person or
thing
20. IN ORDER TO MAKE A
POSITIVE IMPACT
A RESPONSE NEEDS TO HAPPEN
QUICKLY
33. EVERYTHING IS A STREAM
UNBOUNDED STREAMS
BOUNDED STREAMS
AKA BATCH PROCESSING
34. TIME IN STREAMING
EPISODE I EPISODE II EPISODE III EPISODE IV EPISODE V EPISODE VI EPISODE VII EPISODE VIII EPISODE IX
1999 2002 2005 1977 1980 1983 2015 2017 2019
The
Phantom
Menace
Attack of
the Clones
Revenge of
the Sith
A New
Hope
The Empire
Strikes Back
Return of
the Jedi
The Force
Awakens
The Last
Jedi
?
ORDERED BY EVENT TIME
PROCESSING TIME
35. TIME IN STREAMING
EPISODE I EPISODE II EPISODE IIIEPISODE IV EPISODE V EPISODE VI EPISODE VII EPISODE VIII EPISODE IX
1999 2002 20051977 1980 1983 2015 2017 2019
The
Phantom
Menace
Attack of
the Clones
Revenge of
the Sith
A New
Hope
The Empire
Strikes Back
Return of
the Jedi
The Force
Awakens
The Last
Jedi
?
EVENT TIME
ORDERED BY PROCESSING TIME
40. BUILDING BLOCKS
SQL / TABLE API
DataStream API
ProcessFunction
APIs
(dynamic tables)
(streams, windows)
(events, state, time)
HIGH LEVEL
ANALYTICS API
STREAM AND BATCH
DATA PROCESSING
STATEFUL EVENT-
DRIVEN APPLICATIONS
CONCISENESS
EXPRESSIVENESS
41. BUILDING BLOCKS
SQL / TABLE API
APIs
(dynamic tables)
HIGH LEVEL
ANALYTICS API
UNBOUNDED STREAM UNBOUNDED STREAM
CONTINUOUS QUERY
DYNAMIC TABLE DYNAMIC TABLE
42. BUILDING BLOCKS
SQL / TABLE API
APIs
(dynamic tables)
HIGH LEVEL
ANALYTICS API
SELECT *
FROM Ticker
MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY rowtime
MEASURES C.price AS lastPrice
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (A B* C)
DEFINE
A AS A.price > 10,
B AS B.price < 15,
C AS C.price > 12
)
44. LET‘S HAVE A CLOSER LOOK
final StreamExecutionEnvironment env = getExecutionEnvironment();
final DataStreamSource<Integer> stream = env.fromElements(1, 2, 3, 4);
stream
.map((MapFunction<Integer, Integer>) i -> i + 2)
.filter((FilterFunction<Integer>) i -> i % 2 == 0)
.print();
env.execute();
DATA SOURCE
TRANSFORMATION
DATA SINK
49. RUNTIME
SOURCE MAP
PRINT
FILTER
OPERATOR CHAIN OPERATOR
OPERATOR
SUBTASK SUBTASK
TASKSOURCE MAP FILTER
OPERATOR CHAIN OPERATOR
SUBTASK SUBTASK
STREAM
PARTITIONS
STREAMING DATAFLOW
(PARALLELIZED VIEW)
A Flink cluster has a JOB MANAGER and multiple
TASK MANAGERS. Each of those is a JVM.
52. COMPANY SNAPSHOT
More than
400 employees
Founded 2007
in Germany
Headquarter in
Hamburg
+160m EUR revenue
made in 2017
7 live games
>30 language versions
EBITDA margin
of 25%
53. I AM LEGEND
OUR PORTFOLIO Simulation Strategy RPG Browser Multi-device Mobile
73. USE CASE NTCRM
EVENT BUS
EVENT
CLIENT
EVENTGATEWAY
PLAYER DATANTCRM
React to events with interstitials in < 10 seconds
74. USE CASE NTCRM
Elvenar has a trading feature that sometimes
causes confusion. With NTCRM we can react to
this and show more details within interstitials
exactly when the player needs it.
75. JUST DO IT
DEMO TIME
Check it out on Github: https://github.com/prenomenon/codetalks-flinkdemo
76. GET IN TOUCH
InnoGames GmbH
Friesenstrasse 13
20097 Hamburg
https://www.innogames.com
Volker Janz
Senior Software Developer
Corporate Systems - Analytics
@prenomenon
Feedback appreciated!
85. STATE
OPERATOR STATE KEYED STATE
Bound only to
an operator
Bound to an
operator and key
PLUGGABLE BACKEND
MULTIPLE PRIMITIVES SUPPORTED
GUARANTEED CONSISTENCY IN CASE OF A FAILURE
Notes de l'éditeur
Spark: Microbatching
Apex: YARN, bound to Hadoop
Beam: DSL, supports mult. streaming engines (Flink, Spark)
Heron: Successor of Storm at Twitter
Kafka Streams: Part of Kafka
Kinesis: Running on AWS
Questions & Comments Slide (built on the chapter slides)