KAFKA +
Building the World's Realtime Transit Infrastructure
For Illustration only
SURGE - CIRCA 2013
SURGE - CIRCA 2016
DATA CONSUMERS
Real-time, Fast
Analytics
BATCH PIPELINE
Storm
Applications
Data Science
Analytics
Reporting
KAFKA
VERTICA
...
Product
Features
Predictive
Models
Operational
Analytics
Business
Intelligence
INFRASTRUCTURE ECOSYSTEM
NEAR REALTIME
PRICE SURGING
PRODUCT FEATURES
FRAUD -
ANOMALY
DETECTION
PREDICTIVE MODELS
PREDICTIVE MODELS
ETA
OPERATIONAL ANALYTICS
UberEATs
OPERATIONAL ANALYTICS
XP
OPERATIONAL ANALYTICS
BUSINESS INTELLIGENCE
KAFKA 8KAFKA 7 MIGRATOR
Limited Availability
Difficult to Scale
Not multi-DC Multi-lang incompatibility Multi-DC, multi-la...
Kafka 7
Mirrormaker
2.0
Rest
architecture
Data AuditAutomated
Topic Mgmt
Logs Business events
Async REST library
Data Audit
Local spooling
High throughput
custom protocol
REST ARCHITECTURE
Rest P...
Automated Schema and Topic Management
Mirrormaker 2.0
Robust
Data Audit
Dynamic topics
MIRROR MAKER 2.0
Destination DCSource DC
Msg counts across multiple DCs
End-end latencies across multiple
DCs
DATA AUDIT FOR KAFKA MESSAGES
Mirrormaker
2.0
Rest
architecture
Data Audit Kafka 8Automated
Topic Mgmt
A ROBUST FUTURE
0 data loss messaging system
Data discovery and lineage
Quota management
Self-correcting brokers
Active ac...
Real-time Data
Dynamic SQL(ish)
Real-time decision
THE FUTURE
Real-time Data
Custom Application
Real-time decision
THE PRE...
TELEMATICS
SELF DRIVING CAR
Thank you, Kafka Community!
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Prochain SlideShare
Chargement dans…5
×

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout

4 915 vues

Publié le

How Uber uses Kafka to drive our real-time business.

Publié dans : Ingénierie
0 commentaire
10 j’aime
Statistiques
Remarques
  • Soyez le premier à commenter

Aucun téléchargement
Vues
Nombre de vues
4 915
Sur SlideShare
0
Issues des intégrations
0
Intégrations
2 601
Actions
Partages
0
Téléchargements
179
Commentaires
0
J’aime
10
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive
  • Duration: Keynote is 15 mins long

    Good morning!
    My name is Aaron Schildkrout.
    I run Data and Marketing at Uber.

    I’m here today to talk to you about our Realtime journey at Uber - and particularly the critical and hugely empowering role Kafka (including Confluent and the whole Kafka community) has played in this journey.


  • Uber is realtime transit infrastructure for the globe.

    We’ve stated many times that we want this infrastructure to be as reliable as running water.

    A utility. A right even.

    A project that started out as a cool app to get you black cars on demand - is quickly becoming among the largest global infrastructure inventions of all time.

    And - like the cars moving on the streets outside right now - it is all taking place now and now and now. It is real time.
  • We’re not the only ones.

    The internet is quite literally penetrating our lives.
    -our cities
    -our relationships
    -our bodies

    This is a known story.

    But it’s getting more radical by the day. And as this penetration increases - in volume, in immediacy, in depth - there is an unbelievable increase in the need for systems that facilitate the flow of information, in real time, between our lives and our machines and back again.

    That’s why we’re all here.
  • Compressing time and space - is...a non-trivial technical problem.

    Uber for instance - has always sought to provide this kind of truly responsive, realtime infrastructure.

    But in the beginning we were...just starting. This is surge circa 2012/3 in our driver app.

    Our first version of surge, v1, used data it queried directly from our dispatch service
    There was only one Node.js process per city
    The geofenses were very big and not granular at all (causing a lot of problems and huge inefficiency).
  • This is surge today - with the addition of much more granular geo-temporal surge targeting.

    We are updating - in real-time - our understanding of supply and demand in highly specific geographies to allow us to calculate surge in the hexagons shown in this screen.

    This system now runs on Kafka - as opposed to our janky node query - and while it took us a bit of time to make this truly work at our exponentially exploding global scale...we’ve gotten...at least closer.

    That’s the story I’ll tell today.
  • To get the obvious architectural diagram out of the way - here’s how Kafka 8 is currently used @ Uber.
  • The Real-time infrastructure ecosystem - which includes Kafka - at Uber powers many key pieces of our business. I think of this in this topology...
  • Surge - as noted earlier..
  • FRAUD MODELS
  • ETA - real-time system
  • Cities use real-time operational analytics to active manage their cities - making adjustments in dispatch, messaging, etc - to optimize city functioning.
    Much of Uber’s success has to do with the amazing speed and agility of our on-the-ground global city teams - and much of this comes from empowering them with realtime tools.
  • We’ve recently applied this same type of infrastructure to our Uber Eats business, which is rapidly scaling now and involves significant operational complexity.
  • Internally analytics on our experimentation pipeline - which now powers the creation of hundreds of new experiments weekly and on which our teams are acting on daily based on rapid data feedback loops - is a real-time system.
  • Pretty awesome. But it took a long journey to get there.

    2013 - we first launched Kafka 7 each application essentially ran its own Kafka cluster

    2014 - started a transition to K8 - where we started moving all our K7 data to K8 through the K7 migrator.

    2015 to today - we deployed a fully functional K8 pipeline - stable with scalable producers and consumers and multi-DC, multi language support
  • Along the way we ran into some significant limitations…and we did a bunch of work that I’ll work through now to complete our migration to Kafka 8 - and, more fundamentally, to make Kafka work at our scale.
  • We implemented REST proxy improvements, adding a new binary protocol for high throughput.
    By building REST client libraries, we facilitated multi-language support (which was important given our 4-language environment)
  • We automated schema and topic management. In a world with many thousands of topics and hundreds of engineers and teams producing data, the absence of strong tools around schema inferencing, enforcement and management were a huge painpoint.
  • We built Mirrormaker 2.0, which we’ll soon be open sourcing…

    It’s More robust // Easier to operate // and allows for dynamic topic addition
  • And…
    We built a series of Data auditing tools - allowing us to track data loss and latency spikes at different points in the Kafka pipeline, which at scale became critical for triaging and solving problems at a rapid pace
  • All kafka data producers at Uber are now running Kafka 8. The project has been a huge success and is now powering much of Uber’s data infrastructure. It is...mission critical.
  • Add notes
  • The goal is to shrink the barrier between real time Infra and analytical usage.
  • We’re currently capturing accelerometer data from the driver’s / rider’s phone via Kafka. This data is then used for:
    Detecting traffic / road conditions ? (need to confirm)
    1) we use our motionstash data to generate safety models an safety scores for all our drivers (Supervised machine learning and classification algorithms)
    2) we do per trip adhoc- analysis for safety by computing safety scores per driver.
    Use the models generated in 1) to predict in realtime and alert a driver about their unsafe driving.
  • Duration: Keynote is 15 mins long
  • Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout

    1. 1. KAFKA + Building the World's Realtime Transit Infrastructure
    2. 2. For Illustration only
    3. 3. SURGE - CIRCA 2013
    4. 4. SURGE - CIRCA 2016
    5. 5. DATA CONSUMERS Real-time, Fast Analytics BATCH PIPELINE Storm Applications Data Science Analytics Reporting KAFKA VERTICA RIDER APP DRIVER APP API / SERVICES DISPATCH (gps logs) Mapping & Logistic Ad-hoc exploration ELK Samza Alerts, Dashboards Debugging REAL-TIME PIPELINE HADOOP Surge Mobile App DATA PRODUCERS KAFKA 8 ECOSYSTEM @UBER
    6. 6. Product Features Predictive Models Operational Analytics Business Intelligence INFRASTRUCTURE ECOSYSTEM
    7. 7. NEAR REALTIME PRICE SURGING PRODUCT FEATURES
    8. 8. FRAUD - ANOMALY DETECTION PREDICTIVE MODELS
    9. 9. PREDICTIVE MODELS ETA
    10. 10. OPERATIONAL ANALYTICS
    11. 11. UberEATs OPERATIONAL ANALYTICS
    12. 12. XP OPERATIONAL ANALYTICS
    13. 13. BUSINESS INTELLIGENCE
    14. 14. KAFKA 8KAFKA 7 MIGRATOR Limited Availability Difficult to Scale Not multi-DC Multi-lang incompatibility Multi-DC, multi-language support 2013 2014 2015 - 2016 KAFKA 7 WORLD Difficult to Operate Producer Scale Issues High Availability High Scalability Kafka 7 + Mirrormaker Deployed everywhere Kafka 7 migrator Deployed everywhere New Kafka 8 pipeline
    15. 15. Kafka 7 Mirrormaker 2.0 Rest architecture Data AuditAutomated Topic Mgmt
    16. 16. Logs Business events Async REST library Data Audit Local spooling High throughput custom protocol REST ARCHITECTURE Rest Proxy
    17. 17. Automated Schema and Topic Management
    18. 18. Mirrormaker 2.0 Robust Data Audit Dynamic topics MIRROR MAKER 2.0 Destination DCSource DC
    19. 19. Msg counts across multiple DCs End-end latencies across multiple DCs DATA AUDIT FOR KAFKA MESSAGES
    20. 20. Mirrormaker 2.0 Rest architecture Data Audit Kafka 8Automated Topic Mgmt
    21. 21. A ROBUST FUTURE 0 data loss messaging system Data discovery and lineage Quota management Self-correcting brokers Active active data pipelines
    22. 22. Real-time Data Dynamic SQL(ish) Real-time decision THE FUTURE Real-time Data Custom Application Real-time decision THE PRESENT
    23. 23. TELEMATICS
    24. 24. SELF DRIVING CAR
    25. 25. Thank you, Kafka Community!

    ×