Tesla ingests trillions of events every day from hundreds of unique data sources through our streaming data platform. Find out how we developed a set of high-throughput, non-blocking primitives that allow us to transform and ingest data into a variety of data stores with minimal development time. Additionally, we will discuss how these primitives allowed us to completely migrate the streaming platform in just a few months. Finally, we will talk about how we scale team size sub-linearly to data volumes, while continuing to onboard new use cases.
14. A Simple Use Case
• Gzipped, custom event format
• Collected in an edge Kafka cluster
• Land in central DataLake for analysts
• …
• Profit!
15. Mirror from Edge Kafka
• Its just another Channel application
• No surprise bugs or operations
• Regex mapping
• Sampling
edge.cool_data
cool_data
edge.legacy_data
17. Raw to Canonical
• Many-to-many
• Decoders: gzip, b64
• Built-in parsers: JSON, CSV
• Custom Parser
• Flexible for unplanned uses
• High parallelizable
Kafka
Source
Commit
Decode
Parse
Produce
18. Parser API
public parse(byte[]) :: Iterator<Map<String, Object>>
• Exception during call -> Skip record
• Exception during iteration -> Halt the stream
• Sometimes means early materialization
24. Kafka Monitoring
• Kafka is very simple to monitor + observe
• One dashboard can tell you everything at a glance
But… people don’t think in offsets and counts
% SLOs and time-based lag monitoring
25. Operations
• Many open source tools
• Kafka Monitor https://github.com/linkedin/kafka-monitor
• Burrow https://github.com/linkedin/Burrow
• Cruise Control https://github.com/linkedin/cruise-control
• Our own tools https://github.com/teslamotors/kafka-helmsman
• Freshness tracker
• Topic Enforcer
• Rolling Restart
26. Kubernetes
• Dynamic scalability
• Incidents or usual growth
• Handle daily peaks
• Load smearing across
streams
• Not free – infra is non-trivial
27. What about when things go sideways
• A rack fails
• Your database chokes
• The network is having a bad day
And your users need their data RIGHT NOW!
28. Channels Backfill
• “Freshest” data can be ingest
immediately
• Looks just like a regular channel
• Just select a range in the past &
deploy
30. Summary
• Lots of kinds of data + IoT challenges
• Simplicity for operations at scale
• Backpressure, non-blocking, high-throughput
• Flexibly configuration based