5. Architecture Choices
● Kinesis
○ Existing infrastructure for batch pipeline
● Flink
○ Scalable/performant distributed stream processor
○ SQL & Stream apis
● Druid
○ Scalable in-memory columnar database
○ Support for geospatial data
○ Extensible
○ Native integration with superset
6. Flink Streaming SQL
● Familiarity with SQL
● Powerful semantics for data manipulation
● Streaming and batch mode
7. UDFs
● Geohash
● Geo region extraction
● URL cardinality reduction/normalization
○ /users/d9cca721a735d/location -> /users/{hash}/location
○ /v1//api// -> /v1/api
● User agent parsing
○ OS name / version
○ App Name / version