At Under Armour Connected Fitness, we’ve built an event streaming platform on top of Kafka and the Confluent stack that makes it easy for developers to produce and consume schema-based events without requiring direct knowledge of Kafka. We are constantly trying to improve the developer experience. The platform consists of multiple federated Kafka clusters, a schema registry, a topology service, an archiver and specialized client libraries and Web / CLI tools that assist developers with producer and consumer workflows.
In this talk, we will take a deeper dive into the design and implementation of a Scala/Java implementation of our client library that allows developers to produce or consume events without worrying about the underlying infrastructure and their location while enjoying the benefits of data compatibility through schemas. We’ll also look at an HTTP based client proxy that exposes the same API but for languages without our native support. Finally, we’ll walk through Web and CLI tools we built to make working with the platform easier.
The content of this talk will be primarily aimed at software developers looking for ideas on how to build Kafka client tools that allow producer/consumer interactions protected by schema-based event definitions while hiding details of the underlying infrastructure.
6. 6
Under Armour Connected Fitness
• November 2013 - Under Armour acquires MapMyFitness Inc
• February 2015 - Under Armour acquires MyFitnessPal
• February 2015 - Under Armour acquires Endomondo
• January 2016 - Announce HealthBox , Gemini 2 RE
8. 8
MyFitnessPal and Kafka
• MFP started as a Rails monolith
• Broken into microservices written in Scala and Ruby
• Data integration challenges
• Service dependencies difficult to manage
14. 14
Other Challenges…
• Client libraries for non-JVM languages were of varying quality
• Developers needed to know about Kafka
• Wanted to federate Kafka clusters - no one team should have to maintain all
clusters
19. 19
Challenges Recap
• Engineers needed to know a lot about Kafka clusters
• Data migrations broke consumer contracts
• Client libraries for non-JVM languages
• Management of Kafka clusters
• Data retention policies
20. 20
Location Transparency
• A publishing client needn’t be concerned with things like clusters, topics, etc
• Need some kind of source of truth for event locations
21. 21
Topology Service
• Each event has a namespace and event type (globally unique)
• The topology service instructs clients where to publish or consume those
messages
• Introduces concept of “zones” which represent one or more clusters
22. 22
Data Migrations
• Solved problem - use Schemas
• Confluent Schema Registry + Small Service to capture Metadata (event type
and namespace)
• Confluent Schema Registry uses Avro, so we do too
23. 23
Data Migrations
pending available
{
event_type: "ActivityFeedStoryUpdate",
namespace: "mmf",
status: "pending",
confluent_subject: "mmf_activityfeedstoryupdate",
schema_id: "bb68e5381e88d52574b0f50a000fbe9b"
}
33. 33
Data Retention
• Archiving is now just a job for a specialized consumer
• Archiving is done “per-zone”. Some data shouldn’t be archived, it only gets
published to zones that are not archived (as per event type)
• In our case, data is stored in S3 and then accessed through a variety of tools
for analysis, batch processing, etc.
37. 37
Help Publishers - Avro Helper Library
• helpful-avro Scala library
• Adds a layer of robustness
• Tries a few tricks to make a payload validate against a schema
40. 40
Nullable / Optional Fields
{
"first_name": "Paul",
"last_name": "Osman",
"username": "paulosman"
}
{
"first_name": "Paul",
"last_name": "Osman",
"age": {"null":null},
"username": {"string": "paulosman"}
}
age omitted
not type annotated
41. • Browser and CLI based tools that allow people to observe activity being
published to a specific zone
• Give people a way to see their event go through the system
• End to end monitoring, monitoring of consumer lag
41
Observability
42. 42
Future Plans
• Make schema authoring and registration easier and more automated
• Extend helpful-avro to work with Case Classes and POJOs
• Further hide implementation details
43. 43CONFIDENTIAL & BUSINESS PROPRIETARY INFORMATION OF UNDER ARMOUR, INC. COPYRIGHT (C)2015
Thank You
http://underarmour.jobs