SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
1 Proprietary & Confidential1 Proprietary & Confidential
Using Akka Streams
For Real Time Decision Making
Dustin Lyons
Engineering Manager, Data Platform
2 Proprietary & Confidential
● Engineer turned Engineering Manager
at Credit Karma
● Data & Analytics on the Platform team
● Build things that make decisions on
where data should go
● Lover of science fiction, sushi, and
electronic music
Who I am
3 Proprietary & Confidential
Credit Karma is a free financial assistant, helping over
60 million people make progress.
4 Proprietary & Confidential
1. Data Infrastructure at Credit Karma: Past and current
2. Mo’ data, mo’ problems
3. Akka Streams saves the day
4. Results and learnings
5. Q&A
Agenda for today
5 Proprietary & Confidential
Data scale (MB/min) @ Credit Karma
6 Proprietary & Confidential
Credit Karma data platform: PHP days
PHP Scripts
7 Proprietary & Confidential
New tools to help with scale
8 Proprietary & Confidential
Credit Karma data platform: Scala in 2014
Data Warehouse Import
9 Proprietary & Confidential
New tools to help with concurrency
10 Proprietary & Confidential
Credit Karma data platform: Akka in 2015
Analytics Export
Service
+
Data Warehouse
Import
11 Proprietary & Confidential
Credit Karma data platform: Akka in 2015
Analytics Export
Service
+
Data Warehouse
Import
12 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer
Workers
Kafka Importer
Workers
Analytics Export Service
HTTP Ingest Server
13 Proprietary & Confidential
Analytics export service
14 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer
Workers
Kafka Importer
Workers
Analytics Export Service
HTTP Ingest Server
15 Proprietary & Confidential
Analytics export service
16 Proprietary & Confidential
Data warehouse import
ReaderDeduplicatorProcessor Extractors
Data Warehouse Import Service
17 Proprietary & Confidential
Data warehouse import
18 Proprietary & Confidential
Marble maze
19 Proprietary & Confidential
Marble maze
20 Proprietary & Confidential
Marble maze
21 Proprietary & Confidential
Marble maze
22 Proprietary & Confidential
Marble maze
1Reading from file
23 Proprietary & Confidential
Marble maze
1
2
Reading from file
Waiting for external service
24 Proprietary & Confidential
Marble maze
1
3
2
Reading from file
Objects sit in heap
Waiting for external service
25 Proprietary & Confidential
Marble maze
1
3
2
Reading from file
Objects sit in heap
Waiting for external service
4 Database Insert
26 Proprietary & Confidential
Backpressure
27 Proprietary & Confidential
What is backpressure?
Backpressure refers to the buildup of data at an I/O switch
when buffers are full and not able to receive additional data.
No additional data packets are transferred until the
bottleneck of data has been eliminated or the buffer has been
emptied.
28 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer
Workers
Kafka Importer
Workers
Analytics Export Service
HTTP Ingest Server
29 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer
Workers
Kafka Importer
Workers
Analytics Export Service
HTTP Ingest Server
30 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer
Workers
Kafka Importer
Workers
Analytics Export Service
HTTP Ingest Server
31 Proprietary & Confidential
Data warehouse import
ReaderDeduplicatorProcessor Extractors
Data Warehouse Import Service
32 Proprietary & Confidential
Akka Streams: Backpressure in action
Actor Actor
Data
Demand
33 Proprietary & Confidential
Akka Streams: Creating a stream
Source Flow Sink
34 Proprietary & Confidential
Akka Streams: Built in stages
Built In Sources
• actorRef • actorPublisher
• fromIterator • fromFile
• Apply (from a Seq)
Built In Processing Stages
• map • filter
• grouped • drop/take
• dropWhile/takeWhile • sliding
Built In Sinks
• head • last
• seq • foreach
• actorRef • actorSubscriber
• reduce • fold
Backpressure Aware Stages
• mapAsync • buffer (Backpressure)
• batch • buffer (Drop)
• buffer (Fail)
Reference: http://doc.akka.io/docs/akka/current/scala/stream/stages-overview.html
35 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer
Workers
Kafka Importer
Workers
Analytics Export Service
HTTP Ingest Server
36 Proprietary & Confidential
Analytics export service
Coordinator
Analytics Export Service
HTTP Ingest ServerAkka Stream
37 Proprietary & Confidential
Analytics export service
38 Proprietary & Confidential
Data warehouse import
ReaderDeduplicatorProcessor Extractors
Data Warehouse Import Service
39 Proprietary & Confidential
Data warehouse import
Extractors
Data Warehouse Import Service
Akka Stream
40 Proprietary & Confidential
Data warehouse import service
41 Proprietary & Confidential
Analytics export service heap (before)
GiB=>
Time =>
28 GiB
Red: Heap Space
Blue: Used Heap Space
Purple: Max Heap Space
42 Proprietary & Confidential
Analytics export service heap (after)
GiB=>
Time =>
28 GiB
43 Proprietary & Confidential
Data warehouse import
44 Proprietary & Confidential
Data warehouse import
45 Proprietary & Confidential
Data warehouse import
46 Proprietary & Confidential
• Akka Streams allowed us to move data with increased throughput and optimal
performance
• No longer getting paged for JVM out of memory or spending time tuning our
services
• Reduced the SLA for data delivery to our business stakeholders
Final results
47 Proprietary & Confidential
• Akka Actors: Great for low latency
• Akka Streams: Optimized for high throughput and solving back pressure
• Built on top of Akka Actors
• Don’t try to build high throughput systems with an actor system, you’ll just start
building Akka Streams
Lessons learned
48 Proprietary & Confidential48 Proprietary & Confidential
Thank you!
Q&A
Dustin Lyons
Engineering Manager, Data Platform

Contenu connexe

Tendances

The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
confluent
 

Tendances (20)

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
 
Running Kafka for Maximum Pain
Running Kafka for Maximum PainRunning Kafka for Maximum Pain
Running Kafka for Maximum Pain
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windows
 
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data PipelinesETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
 
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...
 
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
Detecting Real-Time Financial Fraud with Cloudflow on Kubernetes
Detecting Real-Time Financial Fraud with Cloudflow on KubernetesDetecting Real-Time Financial Fraud with Cloudflow on Kubernetes
Detecting Real-Time Financial Fraud with Cloudflow on Kubernetes
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to One
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkReactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha NarkhededotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
Data Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache KafkaData Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache Kafka
 

Similaire à How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors

Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
HostedbyConfluent
 
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
Sumit Rangwala
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
Chester Chen
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
 
Inside Of Mbga Open Platform
Inside Of Mbga Open PlatformInside Of Mbga Open Platform
Inside Of Mbga Open Platform
Hideo Kimura
 

Similaire à How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors (20)

Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
 
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
 
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
 
[Public] 7 archetipi della tecnologia moderna [italy]
[Public] 7 archetipi della tecnologia moderna [italy][Public] 7 archetipi della tecnologia moderna [italy]
[Public] 7 archetipi della tecnologia moderna [italy]
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0
 
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with KafkaAvoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Inside Of Mbga Open Platform
Inside Of Mbga Open PlatformInside Of Mbga Open Platform
Inside Of Mbga Open Platform
 
What’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding
What’s New in Syncsort Integrate? New User Experience for Fast Data OnboardingWhat’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding
What’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding
 
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream Processing
 
エンタープライズブロックチェーン基盤のひとつとしてのHyperledger Fabricの強みと課題
エンタープライズブロックチェーン基盤のひとつとしてのHyperledger Fabricの強みと課題エンタープライズブロックチェーン基盤のひとつとしてのHyperledger Fabricの強みと課題
エンタープライズブロックチェーン基盤のひとつとしてのHyperledger Fabricの強みと課題
 
Puppet at Scale – Case Study of PayPal's Learnings - PuppetConf 2013
Puppet at Scale – Case Study of PayPal's Learnings - PuppetConf 2013Puppet at Scale – Case Study of PayPal's Learnings - PuppetConf 2013
Puppet at Scale – Case Study of PayPal's Learnings - PuppetConf 2013
 

Plus de Lightbend

Plus de Lightbend (20)

IoT 'Megaservices' - High Throughput Microservices with Akka
IoT 'Megaservices' - High Throughput Microservices with AkkaIoT 'Megaservices' - High Throughput Microservices with Akka
IoT 'Megaservices' - High Throughput Microservices with Akka
 
How Akka Cluster Works: Actors Living in a Cluster
How Akka Cluster Works: Actors Living in a ClusterHow Akka Cluster Works: Actors Living in a Cluster
How Akka Cluster Works: Actors Living in a Cluster
 
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native ApplicationsThe Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
 
Putting the 'I' in IoT - Building Digital Twins with Akka Microservices
Putting the 'I' in IoT - Building Digital Twins with Akka MicroservicesPutting the 'I' in IoT - Building Digital Twins with Akka Microservices
Putting the 'I' in IoT - Building Digital Twins with Akka Microservices
 
Digital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesDigital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and Microservices
 
Cloudstate - Towards Stateful Serverless
Cloudstate - Towards Stateful ServerlessCloudstate - Towards Stateful Serverless
Cloudstate - Towards Stateful Serverless
 
Digital Transformation from Monoliths to Microservices to Serverless and Beyond
Digital Transformation from Monoliths to Microservices to Serverless and BeyondDigital Transformation from Monoliths to Microservices to Serverless and Beyond
Digital Transformation from Monoliths to Microservices to Serverless and Beyond
 
Akka Anti-Patterns, Goodbye: Six Features of Akka 2.6
Akka Anti-Patterns, Goodbye: Six Features of Akka 2.6Akka Anti-Patterns, Goodbye: Six Features of Akka 2.6
Akka Anti-Patterns, Goodbye: Six Features of Akka 2.6
 
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
Microservices, Kubernetes, and Application Modernization Done Right
Microservices, Kubernetes, and Application Modernization Done RightMicroservices, Kubernetes, and Application Modernization Done Right
Microservices, Kubernetes, and Application Modernization Done Right
 
Full Stack Reactive In Practice
Full Stack Reactive In PracticeFull Stack Reactive In Practice
Full Stack Reactive In Practice
 
Akka and Kubernetes: A Symbiotic Love Story
Akka and Kubernetes: A Symbiotic Love StoryAkka and Kubernetes: A Symbiotic Love Story
Akka and Kubernetes: A Symbiotic Love Story
 
Scala 3 Is Coming: Martin Odersky Shares What To Know
Scala 3 Is Coming: Martin Odersky Shares What To KnowScala 3 Is Coming: Martin Odersky Shares What To Know
Scala 3 Is Coming: Martin Odersky Shares What To Know
 
Migrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive SystemsMigrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive Systems
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
 
Designing Events-First Microservices For A Cloud Native World
Designing Events-First Microservices For A Cloud Native WorldDesigning Events-First Microservices For A Cloud Native World
Designing Events-First Microservices For A Cloud Native World
 
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For ScalaScala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
 
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On KubernetesHow To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
 
A Glimpse At The Future Of Apache Spark 3.0 With Deep Learning And Kubernetes
A Glimpse At The Future Of Apache Spark 3.0 With Deep Learning And KubernetesA Glimpse At The Future Of Apache Spark 3.0 With Deep Learning And Kubernetes
A Glimpse At The Future Of Apache Spark 3.0 With Deep Learning And Kubernetes
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors

  • 1. 1 Proprietary & Confidential1 Proprietary & Confidential Using Akka Streams For Real Time Decision Making Dustin Lyons Engineering Manager, Data Platform
  • 2. 2 Proprietary & Confidential ● Engineer turned Engineering Manager at Credit Karma ● Data & Analytics on the Platform team ● Build things that make decisions on where data should go ● Lover of science fiction, sushi, and electronic music Who I am
  • 3. 3 Proprietary & Confidential Credit Karma is a free financial assistant, helping over 60 million people make progress.
  • 4. 4 Proprietary & Confidential 1. Data Infrastructure at Credit Karma: Past and current 2. Mo’ data, mo’ problems 3. Akka Streams saves the day 4. Results and learnings 5. Q&A Agenda for today
  • 5. 5 Proprietary & Confidential Data scale (MB/min) @ Credit Karma
  • 6. 6 Proprietary & Confidential Credit Karma data platform: PHP days PHP Scripts
  • 7. 7 Proprietary & Confidential New tools to help with scale
  • 8. 8 Proprietary & Confidential Credit Karma data platform: Scala in 2014 Data Warehouse Import
  • 9. 9 Proprietary & Confidential New tools to help with concurrency
  • 10. 10 Proprietary & Confidential Credit Karma data platform: Akka in 2015 Analytics Export Service + Data Warehouse Import
  • 11. 11 Proprietary & Confidential Credit Karma data platform: Akka in 2015 Analytics Export Service + Data Warehouse Import
  • 12. 12 Proprietary & Confidential Analytics export service Coordinator Data Transformer Workers Kafka Importer Workers Analytics Export Service HTTP Ingest Server
  • 13. 13 Proprietary & Confidential Analytics export service
  • 14. 14 Proprietary & Confidential Analytics export service Coordinator Data Transformer Workers Kafka Importer Workers Analytics Export Service HTTP Ingest Server
  • 15. 15 Proprietary & Confidential Analytics export service
  • 16. 16 Proprietary & Confidential Data warehouse import ReaderDeduplicatorProcessor Extractors Data Warehouse Import Service
  • 17. 17 Proprietary & Confidential Data warehouse import
  • 18. 18 Proprietary & Confidential Marble maze
  • 19. 19 Proprietary & Confidential Marble maze
  • 20. 20 Proprietary & Confidential Marble maze
  • 21. 21 Proprietary & Confidential Marble maze
  • 22. 22 Proprietary & Confidential Marble maze 1Reading from file
  • 23. 23 Proprietary & Confidential Marble maze 1 2 Reading from file Waiting for external service
  • 24. 24 Proprietary & Confidential Marble maze 1 3 2 Reading from file Objects sit in heap Waiting for external service
  • 25. 25 Proprietary & Confidential Marble maze 1 3 2 Reading from file Objects sit in heap Waiting for external service 4 Database Insert
  • 26. 26 Proprietary & Confidential Backpressure
  • 27. 27 Proprietary & Confidential What is backpressure? Backpressure refers to the buildup of data at an I/O switch when buffers are full and not able to receive additional data. No additional data packets are transferred until the bottleneck of data has been eliminated or the buffer has been emptied.
  • 28. 28 Proprietary & Confidential Analytics export service Coordinator Data Transformer Workers Kafka Importer Workers Analytics Export Service HTTP Ingest Server
  • 29. 29 Proprietary & Confidential Analytics export service Coordinator Data Transformer Workers Kafka Importer Workers Analytics Export Service HTTP Ingest Server
  • 30. 30 Proprietary & Confidential Analytics export service Coordinator Data Transformer Workers Kafka Importer Workers Analytics Export Service HTTP Ingest Server
  • 31. 31 Proprietary & Confidential Data warehouse import ReaderDeduplicatorProcessor Extractors Data Warehouse Import Service
  • 32. 32 Proprietary & Confidential Akka Streams: Backpressure in action Actor Actor Data Demand
  • 33. 33 Proprietary & Confidential Akka Streams: Creating a stream Source Flow Sink
  • 34. 34 Proprietary & Confidential Akka Streams: Built in stages Built In Sources • actorRef • actorPublisher • fromIterator • fromFile • Apply (from a Seq) Built In Processing Stages • map • filter • grouped • drop/take • dropWhile/takeWhile • sliding Built In Sinks • head • last • seq • foreach • actorRef • actorSubscriber • reduce • fold Backpressure Aware Stages • mapAsync • buffer (Backpressure) • batch • buffer (Drop) • buffer (Fail) Reference: http://doc.akka.io/docs/akka/current/scala/stream/stages-overview.html
  • 35. 35 Proprietary & Confidential Analytics export service Coordinator Data Transformer Workers Kafka Importer Workers Analytics Export Service HTTP Ingest Server
  • 36. 36 Proprietary & Confidential Analytics export service Coordinator Analytics Export Service HTTP Ingest ServerAkka Stream
  • 37. 37 Proprietary & Confidential Analytics export service
  • 38. 38 Proprietary & Confidential Data warehouse import ReaderDeduplicatorProcessor Extractors Data Warehouse Import Service
  • 39. 39 Proprietary & Confidential Data warehouse import Extractors Data Warehouse Import Service Akka Stream
  • 40. 40 Proprietary & Confidential Data warehouse import service
  • 41. 41 Proprietary & Confidential Analytics export service heap (before) GiB=> Time => 28 GiB Red: Heap Space Blue: Used Heap Space Purple: Max Heap Space
  • 42. 42 Proprietary & Confidential Analytics export service heap (after) GiB=> Time => 28 GiB
  • 43. 43 Proprietary & Confidential Data warehouse import
  • 44. 44 Proprietary & Confidential Data warehouse import
  • 45. 45 Proprietary & Confidential Data warehouse import
  • 46. 46 Proprietary & Confidential • Akka Streams allowed us to move data with increased throughput and optimal performance • No longer getting paged for JVM out of memory or spending time tuning our services • Reduced the SLA for data delivery to our business stakeholders Final results
  • 47. 47 Proprietary & Confidential • Akka Actors: Great for low latency • Akka Streams: Optimized for high throughput and solving back pressure • Built on top of Akka Actors • Don’t try to build high throughput systems with an actor system, you’ll just start building Akka Streams Lessons learned
  • 48. 48 Proprietary & Confidential48 Proprietary & Confidential Thank you! Q&A Dustin Lyons Engineering Manager, Data Platform