How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors

•

11 j'aime•5,164 vues

In this webinar, Engineering Manager at Credit Karma, Dustin Lyons, discusses how not long ago his team was facing a common challenge shared by many financial services architects and engineering leaders: not only how to move from the offline, batch-mode processing of Big Data to streaming, Fast Data, and how to enable real-time decision making based on the information flowing in from over 60 million members. Dustin reviews how his team migrated away from PHP and successfully implemented Akka Streams with Apache Kafka to ingest, process and route real-time events throughout their data ecosystem. At the end of this presentation, you’ll better understand: * The design considerations for new Fast Data architectures, from streaming to microservices to real-time analysis. * Some lessons learned when it comes to progressing from batch to streaming using Akka, Spark and Kafka * Why Akka’s self-healing actor model and the resilience that it provides is actually what matters most when delivering real-time customer experiences

Technologie

1 Proprietary & Confidential1 Proprietary & Confidential
Using Akka Streams
For Real Time Decision Making
Dustin Lyons
Engineering Manager, Data Platform

2 Proprietary & Confidential
● Engineer turned Engineering Manager
at Credit Karma
● Data & Analytics on the Platform team
● Build things that make decisions on
where data should go
● Lover of science fiction, sushi, and
electronic music
Who I am

3 Proprietary & Confidential
Credit Karma is a free financial assistant, helping over
60 million people make progress.

4 Proprietary & Confidential
1. Data Infrastructure at Credit Karma: Past and current
2. Mo’ data, mo’ problems
3. Akka Streams saves the day
4. Results and learnings
5. Q&A
Agenda for today

5 Proprietary & Confidential
Data scale (MB/min) @ Credit Karma

6 Proprietary & Confidential
Credit Karma data platform: PHP days
PHP Scripts

7 Proprietary & Confidential
New tools to help with scale

8 Proprietary & Confidential
Credit Karma data platform: Scala in 2014
Data Warehouse Import

9 Proprietary & Confidential
New tools to help with concurrency

10 Proprietary & Confidential
Credit Karma data platform: Akka in 2015
Analytics Export
Service
+
Data Warehouse
Import

11 Proprietary & Confidential
Credit Karma data platform: Akka in 2015
Analytics Export
Service
+
Data Warehouse
Import

12 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer
Workers
Kafka Importer
Workers
Analytics Export Service
HTTP Ingest Server

13 Proprietary & Confidential
Analytics export service

14 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer
Workers
Kafka Importer
Workers
Analytics Export Service
HTTP Ingest Server

15 Proprietary & Confidential
Analytics export service

16 Proprietary & Confidential
Data warehouse import
ReaderDeduplicatorProcessor Extractors
Data Warehouse Import Service

17 Proprietary & Confidential
Data warehouse import

18 Proprietary & Confidential
Marble maze

19 Proprietary & Confidential
Marble maze

20 Proprietary & Confidential
Marble maze

21 Proprietary & Confidential
Marble maze

22 Proprietary & Confidential
Marble maze
1Reading from file

23 Proprietary & Confidential
Marble maze
1
2
Reading from file
Waiting for external service

24 Proprietary & Confidential
Marble maze
1
3
2
Reading from file
Objects sit in heap
Waiting for external service

25 Proprietary & Confidential
Marble maze
1
3
2
Reading from file
Objects sit in heap
Waiting for external service
4 Database Insert

26 Proprietary & Confidential
Backpressure

27 Proprietary & Confidential
What is backpressure?
Backpressure refers to the buildup of data at an I/O switch
when buffers are full and not able to receive additional data.
No additional data packets are transferred until the
bottleneck of data has been eliminated or the buffer has been
emptied.

28 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer
Workers
Kafka Importer
Workers
Analytics Export Service
HTTP Ingest Server

29 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer
Workers
Kafka Importer
Workers
Analytics Export Service
HTTP Ingest Server

30 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer
Workers
Kafka Importer
Workers
Analytics Export Service
HTTP Ingest Server

31 Proprietary & Confidential
Data warehouse import
ReaderDeduplicatorProcessor Extractors
Data Warehouse Import Service

32 Proprietary & Confidential
Akka Streams: Backpressure in action
Actor Actor
Data
Demand

33 Proprietary & Confidential
Akka Streams: Creating a stream
Source Flow Sink

34 Proprietary & Confidential
Akka Streams: Built in stages
Built In Sources
• actorRef • actorPublisher
• fromIterator • fromFile
• Apply (from a Seq)
Built In Processing Stages
• map • filter
• grouped • drop/take
• dropWhile/takeWhile • sliding
Built In Sinks
• head • last
• seq • foreach
• actorRef • actorSubscriber
• reduce • fold
Backpressure Aware Stages
• mapAsync • buffer (Backpressure)
• batch • buffer (Drop)
• buffer (Fail)
Reference: http://doc.akka.io/docs/akka/current/scala/stream/stages-overview.html

35 Proprietary & Confidential
Analytics export service
Coordinator Data Transformer
Workers
Kafka Importer
Workers
Analytics Export Service
HTTP Ingest Server

36 Proprietary & Confidential
Analytics export service
Coordinator
Analytics Export Service
HTTP Ingest ServerAkka Stream

37 Proprietary & Confidential
Analytics export service

38 Proprietary & Confidential
Data warehouse import
ReaderDeduplicatorProcessor Extractors
Data Warehouse Import Service

39 Proprietary & Confidential
Data warehouse import
Extractors
Data Warehouse Import Service
Akka Stream

40 Proprietary & Confidential
Data warehouse import service

41 Proprietary & Confidential
Analytics export service heap (before)
GiB=>
Time =>
28 GiB
Red: Heap Space
Blue: Used Heap Space
Purple: Max Heap Space

42 Proprietary & Confidential
Analytics export service heap (after)
GiB=>
Time =>
28 GiB

43 Proprietary & Confidential
Data warehouse import

44 Proprietary & Confidential
Data warehouse import

45 Proprietary & Confidential
Data warehouse import

46 Proprietary & Confidential
• Akka Streams allowed us to move data with increased throughput and optimal
performance
• No longer getting paged for JVM out of memory or spending time tuning our
services
• Reduced the SLA for data delivery to our business stakeholders
Final results

47 Proprietary & Confidential
• Akka Actors: Great for low latency
• Akka Streams: Optimized for high throughput and solving back pressure
• Built on top of Akka Actors
• Don’t try to build high throughput systems with an actor system, you’ll just start
building Akka Streams
Lessons learned

48 Proprietary & Confidential48 Proprietary & Confidential
Thank you!
Q&A
Dustin Lyons
Engineering Manager, Data Platform

Contenu connexe

Tendances

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive

confluent

Kafka makes so many things easier to do, from managing metrics to processing streams of data. Yet it seems that so many things we have done to this point in configuring and managing it have been object studies in how to make our lives, as the plumbers who keep the data flowing, more difficult than they have to be. What are some of our favorites? * Kafka without access controls * Multitenant clusters with no capacity controls * Worrying about message schemas * MirrorMaker inefficiencies * Hope and pray log compaction * Configurations as shared secrets * One-way upgrades We’ve made a lot of progress over the last few years improving the situation, in part by focusing some of this incredibly talented community towards operational concerns. We’ll talk about the big mistakes you can avoid when setting up multi-tenant Kafka, and some that you still can’t. And we will talk about how to continue down the path of marrying the hot, new features with operational stability so we can all continue to come back here every year to talk about it.

Running Kafka for Maximum Pain

Todd Palino

PayPal currently processes tens of billions of signals per day from different sources in batch and streaming mode. The data processing platform is the one powering these different analytical needs and use cases, not just at PayPal but our adjacencies like Venmo, Hyperwallet and iZettle. End users of this platform demand access to data insights with as much flexibility as possible to explore it with low processing latency. One such use case is where our Switchboard(data de-multiplexer) platform where we process approximately 20 billion events daily and provide data to different teams and platforms with-in PayPal and also to platform outside PayPal for more insights. When we started building this platform Kafka was just another asynchronous message processing platform for us but we have seen it evolving to a place where its adds value not just in terms of event processing but also for platform resiliency and scalability. Takeaway for the audience: Most people work with and have knowledge about data. With this talk I want to present information which is relevant and meaningful to the audience. Information and examples which will make it easier for attendees to understand our complex system and hopefully have some practical takeaways to use Kafka for similar problems on their hand.

Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...

confluent

Using Apache Kafka to Analyze Session Windows

confluent

ETL can be painful with dirty data and outdated batch processes slowing you down; there has to be a better way. In this talk we’ll discuss the benefits of introducing a streaming platform to your architecture including how it can greatly simplify complexity, speed up performance, and help your team deliver the features they need with real-time data integration. Pandora’s Lawrence Weikum will discuss what they’ve done to bring real-time data integration to the team. We’ll review their Kafka-powered data pipelines and how they make the most of Kafka’s Connect API to make it surprisingly system to keep systems in sync. Presented by: Lawrence Weikum, Senior Software Engineer, Pandora Gehrig Kunz, Technical Product Marketing Manager, Confluent

ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines

confluent

Is your organization adopting Kafka as their messaging bus but you've found that it will take too long to migrate your existing service-oriented architecture to a log-oriented architecture? Some of the biggest challenges in building a new stream processor can be implementing all the business logic again. It has become increasingly common for companies with high-throughput source streams and change-data-capture logs to want to build systems fast. At Ticketmaster, we have found a solution to the problem by leveraging the business logic in our existing services and calling them from our Java based KafkaStreams processor applications in an efficient manner. In this talk, we will examine the initial challenges we faced in our transition, then we will explore the solutions we built to address the use cases at Ticketmaster. The primary focus will address our workflow around calling services to bring stream processor applications to market fast. We will review our challenges and share tips for success.

Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...

confluent

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...

confluent

Apache kafka-a distributed streaming platform

confluent

Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful and compatible formats. In this presentation you will see some typical processing patterns for time series based research, from simple statistics to reconstruction of correlation networks. The first case is relevant for anomaly detection and to protect safety. Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like supply chains, material flows in factories, information flows within organizations, and especially in medical research. With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent cloud.

Time Series Analysis Using an Event Streaming Platform

Dr. Mirko Kämpf

Deploying a robust streaming data pipeline can be a daunting task when your company’s financial information is at risk. For starters, how do you ensure proper provisioning of resources? How do you preserve end-to-end application and data consistency? How do you make all of this work in the cloud with Kubernetes and avoid YAML hell? Answer: Cloudflow, a new open-source toolkit for simplifying the development, deployment, and operation of streaming data pipelines.

Detecting Real-Time Financial Fraud with Cloudflow on Kubernetes

Lightbend

Real-Time Data Pipelines with Kafka, Spark, and Operational Databases

SingleStore

Lambda architecture: from zero to One

Serg Masyutin

Speaker: Neil Avery, Technologist, Office of the CTO, Confluent Stream processing is now at the forefront of many company strategies. Over the last couple of years we have seen streaming use cases explode and now proliferate the landscape of any modern business. Use cases including digital transformation, IoT, real-time risk, payments microservices and machine learning are all built on the fundamental that they need fast data and they need it at scale. Apache Kafka® has long been the streaming platform of choice, its origins of being dumb pipes for big data have long since been left behind and now it is the goto-streaming platform of choice. Stream processing beckons as being the vehicle for driving those streams, and along with it brings a world of real-time semantics surrounding windowing, joining, correctness, elasticity, and accessibility. The ‘current state of stream processing’ walks through the origins of stream processing, applicable use cases and then dives into the challenges currently facing the world of stream processing as it drives the next data revolution. Neil is a Technologist in the Office of the CTO at Confluent, the company founded by the creators of Apache Kafka. He has over 20 years of expertise of working on distributed computing, messaging and stream processing. He has built or redesigned commercial messaging platforms, distributed caching products as well as developed large scale bespoke systems for tier-1 banks. After a period at ThoughtWorks, he went on to build some of the first distributed risk engines in financial services. In 2008 he launched a startup that specialised in distributed data analytics and visualization. Prior to joining Confluent he was the CTO at a fintech consultancy. Watch the recording: https://videos.confluent.io/watch/rmU6GHrd4EKFaZrRhdTE3s?.

The State of Stream Processing

confluent

Providing a great media consumption experience to customers is crucial to maximizing audience engagement. To do that, it is important that you make content available for consumption anytime, anywhere, on any device, with a personalized and interactive experience. This session explores the power of big data log analytics (real-time and batched), using technologies like Spark, Shark, Kafka, Amazon Elastic MapReduce, Amazon Redshift and other AWS services. Such analytics are useful for content personalization, recommendations, personalized dynamic ad-insertions, interactivity, and streaming quality. This session also includes a discussion from Netflix, which explores personalized content search and discovery with the power of metadata.

Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013

Amazon Web Services

Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...

Helena Edelson

Reactive Fast Data & the Data Lake with Akka, Kafka, Spark

Todd Fritz

Milano Apache Kafka Meetup by Confluent (First Italian Kafka Meetup) on Wednesday, November 29th 2017. Il talk introduce Apache Kafka (incluse le APIs Kafka Connect e Kafka Streams), Confluent (la società creata dai creatori di Kafka) e spiega perché Kafka è un'ottima e semplice soluzione per la gestione di stream di dati nel contesto di due delle principali forze trainanti e trend industriali: Internet of Things (IoT) e Microservices.

Introduction to Apache Kafka and Confluent... and why they matter

confluent

Slides from Neha Narkhede's keynote at the dotScale conference in Paris on April 24th, 2017. There is a tectonic shift happening in how data powers the core of a company's business. This shift is about the rise of real-time. Apache Kafka was built with the vision to help companies navigate this change and become the central nervous system that makes data available in real-time to all the applications that need to use it. This talk is about how you can put Apache Kafka to practice to help your company make this shift to real-time. And how the Connect and Streams API in Apache Kafka capture the entire scope of what it means to put streams into practice.

dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede

confluent

The Netflix Way to deal with Big Data Problems

Monal Daxini

Presentation by Ewen Cheslack-Postava, Engineer, Apache Kafka Committer, Confluent In streaming workloads, often times data produced at the source is not useful down the pipeline or it requires some transformation to get it into usable shape. Similarly, where sensitive data is concerned, filtering of topics is helpful to ensure that the wrong data doesn't get to the wrong place. The newest release of Apache Kafka now offers the ability to do transformations on individual messages, making is possible to implement finer grained transformations customized to your unique needs. In this session we’ll talk about the new single message transform capabilities, how to use them to implement things like data masking and advanced partitioning, and when you’ll need to use more complex tools like the Kafka Streams API instead.

Data Pipelines Made Simple with Apache Kafka

confluent

Tendances (20)

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive

Running Kafka for Maximum Pain

Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...

Using Apache Kafka to Analyze Session Windows

ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines

Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...

Apache kafka-a distributed streaming platform

Time Series Analysis Using an Event Streaming Platform

Detecting Real-Time Financial Fraud with Cloudflow on Kubernetes

Real-Time Data Pipelines with Kafka, Spark, and Operational Databases

Lambda architecture: from zero to One

The State of Stream Processing

Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013

Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...

Reactive Fast Data & the Data Lake with Akka, Kafka, Spark

Introduction to Apache Kafka and Confluent... and why they matter

dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede

The Netflix Way to deal with Big Data Problems

Data Pipelines Made Simple with Apache Kafka

Similaire à How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors

We produce quite a lot of data. Some of this data comes in the form of business transactions and is stored in a relational database. This relational data is often combined with other non-structured, high volume and rapidly changing datasets known in the industry as Big Data. The challenge for us as data integration professionals is to then combine this data and transform it into something useful. Not just that, but we must also do it in near real-time and using a big data target system such as Hadoop. The topic of this session, real-time data streaming, provides us a great solution for that challenging task. By combining GoldenGate, Oracle’s premier data replication technology, and Apache Kafka, the latest open-source streaming and messaging system for big data, we can implement a fast, durable, and scalable solution. This session will walk through the implementation of GoldenGate and Kafka. Presented at Collaborate16 in Las Vegas.

Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

Michael Rainey

We produce quite a lot of data! Much of the data are business transactions stored in a relational database. More frequently, the data are non-structured, high volume and rapidly changing datasets known in the industry as Big Data. The challenge for data integration professionals is to combine and transform the data into useful information. Not just that, but it must also be done in near real-time and using a target system such as Hadoop. The topic of this session, real-time data streaming, provides a great solution for this challenging task. By integrating GoldenGate, Oracle’s premier data replication technology, and Apache Kafka, the latest open-source streaming and messaging system, we can implement a fast, durable, and scalable solution. Presented at Oracle OpenWorld 2016

Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming

Michael Rainey

Watch our latest quarterly customer education webcast to learn about the latest advancements in Syncsort DMX and DMX-h data integration software, including our new product DMX Change Data Capture (CDC). Many of our customers use DMX-h to quickly and efficiently populate their data lakes with enterprise-wide data, to power a variety of use cases, including data as a service, data archiving, fraud detection, and Customer 360. But, as important as it is to populate the data lake, it’s equally important to keep that data current for accurate decision making. DMX Change Data Capture makes it easy and efficient to keep your data lake fresh after the initial load with real-time data replication that continually applies changes made on your traditional systems to your cluster.

Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...

Precisely

[Public] 7 archetipi della tecnologia moderna [italy]

Nicolas Bortolotti

"Data mesh is a relatively recent architectural innovation, espoused as one of the best ways to fix analytic data. We renegotiate aged social conventions by focusing on treating data as a product, with a clearly defined data product owner, akin to that of any other product. In addition, we focus on building out a self-service platform with integrated governance, letting consumers safely access and use the data they need to solve their business problems. Data mesh is prescribed as a solution for _analytical data_, so that conventionally analytical results (think weekly sales or monthly revenue reports) can be more accurately and predictably computed. But what about non-analytical business operations? Would they not also benefit from data products backed by self-service capabilities and dedicated owners? If you've ever provided a customer with an analytical report that differed from their operational conclusions, then this talk is for you. Adam discusses the resounding successes he has seen from applying data mesh _off-label_ to both analytical and operational domains. The key? Event streams. Well-defined, incrementally updating data products that can power both real-time and batch-based applications, providing a single source of data for a wide variety of application and analytical use cases. Adam digs into the common areas of success seen across numerous clients and customers and provides you with a set of practical guidelines for implementing your own minimally viable data mesh. Finally, Adam covers the main social and technical hurdles that you'll encounter as you implement your own data mesh. Learn about important data use cases, data domain modeling techniques, self-service platforms, and building an iteratively successful data mesh."

Off-Label Data Mesh: A Prescription for Healthier Data

HostedbyConfluent

As organizations modernize their data and analytics platforms, the data lake concept has gained momentum as a shared enterprise resource for supporting insights across multiple lines of business. The perception is that data lakes are vast, slow-moving bodies of data, but innovations like Apache Kafka for streaming-first architectures put real-time data flows at the forefront. Combining real-time alerts and fast-moving data with rich historical analysis lets you respond quickly to changing business conditions with powerful data lake analytics to make smarter decisions. Join this complimentary webinar with industry experts from 451 Research and Arcadia Data who will discuss: - Business requirements for combining real-time streaming and ad hoc visual analytics. - Innovations in real-time analytics using tools like Confluent’s KSQL. - Machine-assisted visualization to guide business analysts to faster insights. - Elevating user concurrency and analytic performance on data lakes. - Applications in cybersecurity, regulatory compliance, and predictive maintenance on manufacturing equipment all benefit from streaming visualizations.

Accelerating Data Lakes and Streams with Real-time Analytics

Arcadia Data

Apache Spark 2.0 set the architectural foundations of Structure in Spark, Unified high-level APIs, Structured Streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2. Continuing forward in that spirit, the upcoming release of Apache Spark 2.3 has made similar strides too, introducing new features and resolving over 1300 JIRA issues. In this talk, we want to share with the community some salient aspects of soon to be released Spark 2.3 features: • Kubernetes Scheduler Backend • PySpark Performance and Enhancements • Continuous Structured Streaming Processing • DataSource v2 APIs • Structured Streaming v2 APIs

What's New in Upcoming Apache Spark 2.3

Databricks

The “People You May Know” (PYMK) recommendation service helps LinkedIn’s members identify other members that they might want to connect to and is the major driver for growing LinkedIn's social network. The principal challenge in developing a service like PYMK is dealing with the sheer scale of computation needed to make precise recommendations with a high recall. PYMK service at LinkedIn has been operational for over a decade, during which it has evolved from an Oracle-backed system that took weeks to compute recommendations to a Hadoop backed system that took a few days to compute recommendations to its most modern embodiment where it can compute recommendations in near real time. This talk will present the evolution of PYMK to its current architecture. We will focus on various systems we built along the way, with an emphasis on systems we built for our most recent architecture, namely Gaia, our real-time graph computing capability, and Venice our online feature store with scoring capability, and how we integrate these individual systems to generate recommendations in a timely and agile manner, while still being cost-efficient. We will briefly talk about the lessons learned about scalability limits of our past and current design choices and how we plan to tackle the scalability challenges for the next phase of growth. https://qcon.ai/qconai2019/presentation/people-you-may-know-fast-recommendations-over-massive-data

[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data

Sumit Rangwala

With components like Spark SQL, MLlib, and Streaming, Spark is a unified engine for building data applications. In this talk, we will take a look at how we use Spark on our own Databricks platform throughout our data pipeline for use cases such as ETL, data warehousing, and real time analysis. We will demonstrate how these applications empower engineering and data analytics. We will also share some lessons learned from building our data pipeline around security and operations. This talk will include examples on how to use Structured Streaming (a.k.a Streaming DataFrames) for online analysis, SparkR for offline analysis, and how we connect multiple sources to achieve a Just-In-Time Data Warehouse.

A Journey into Databricks' Pipelines: Journey and Lessons Learned

Databricks

2018 02-08-what's-new-in-apache-spark-2.3

Chester Chen

Presto @ Zalando - Big Data Tech Warsaw 2020

Piotr Findeisen

Talk by Gordon Guthrie, Senior Software Engineer at Basho Summary A review of the CAP Theorem and the difficulties of resolving conflicts in highly distributed systems. Covering the issues and various theories on how to resolve including the use CRDTs in Riak Details CRDTs are used to replicate data across multiple computers in a network, executing updates without the need for remote synchronisation. This leads to merge conflicts in systems using conventional eventual consistency technology, but CRDTs are designed such that conflicts are mathematically impossible. Under the constraints of the CAP theorem they provide the strongest consistency guarantees for available/partition-tolerant (AP) settings. The CRDT concept was first formally defined in 2007 by Marc Shapiro and Nuno Preguiça in terms of operation commutativity, and development was initially motivated by collaborative text editing. The concept of semilattice evolution of replicated states was first defined by Baquero and Moura in 1997, and development was initially motivated by mobile computing. The two concepts were later unified in 2011. Basho has worked with the EU and Marc Shapiro's team to push CRDTs into distributed systems. Riak v2.x is the first commercial product to include this functionality

Convergent Replicated Data Types in Riak 2.0

Big Data Spain

"Unlock the full potential of your streaming applications with Kafka! As a data engineer, are you eager to supercharge the performance of your streaming workflows? Join us in this session where we dive deep into the intricate integration of Kafka and Spark Structured Streaming. Explore the inner workings, discover control options, and unravel the anatomy of seamless data flow. In this engaging presentation, we'll unravel the inner workings of Kafka, explore its collaboration with Structured Streaming, and scrutinize the various options for stream control. What sets this session apart is our dedicated focus on the common pitfalls – we'll extensively discuss and dissect these challenges. From practical tips to proven techniques, we'll guide you through overcoming these challenges in your data pipelines. Join us for a session filled with insights that not only highlight the challenges but empower you to turn them into opportunities for exceptional results in your streaming applications."

Avoiding Common Pitfalls: Spark Structured Streaming with Kafka

HostedbyConfluent

In this talk, we will discuss how we use Spark as part of a hybrid RDBMS architecture that includes Hadoop and HBase. The optimizer evaluates each query and sends OLTP traffic (including CRUD queries) to HBase and OLAP traffic to Spark. We will focus on the challenges of handling the tradeoffs inherent in an integrated architecture that simultaneously handles real-time and batch traffic. Lessons learned include: - Embedding Spark into a RDBMS - Running Spark on Yarn and isolating OLTP traffic from OLAP traffic - Accelerating the generation of Spark RDDs from HBase - Customizing the Spark UI The lessons learned can also be applied to other hybrid systems, such as Lambda architectures. Bio:- John Leach is the CTO and Co-Founder of Splice Machine. With over 15 years of software experience under his belt, John’s expertise in analytics and BI drives his role as Chief Technology Officer. Prior to Splice Machine, John founded Incite Retail in June 2008 and led the company’s strategy and development efforts. At Incite Retail, he built custom Big Data systems (leveraging HBase and Hadoop) for Fortune 500 companies. Prior to Incite Retail, he ran the business intelligence practice at Blue Martini Software and built strategic partnerships with integration partners. John was a key subject matter expert for Blue Martini Software in many strategic implementations across the world. His focus at Blue Martini was helping clients incorporate decision support knowledge into their current business processes utilizing advanced algorithms and machine learning. John received dual bachelor’s degrees in biomedical and mechanical engineering from Washington University in Saint Louis. Leach is the organizer emeritus for the Saint Louis Hadoop Users Group and is active in the Washington University Elliot Society.

Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine

Data Con LA

Inside Of Mbga Open Platform

Hideo Kimura

We are excited to announce the new general availability of the intuitive graphical interface for DataFunnel™. This browser-based point-and-click interface gives you the ability to move hundreds of relational tables to a different RSBMS – or to Hadoop – in just minutes! Select the schema of tables you’d like to move, filter out any tables, columns or rows you’d like to exclude, and invoke – all with the click of a mouse – in a user-friendly wizard interface. View this webinar on-demand, where we discussed the newest features in Syncsort DMX/DMX-h, DMX CDC and DataFunnel™. During this webinar, you will see a special sneak peek of some of the new exciting additions coming soon to the Syncsort data integration product family! Webinar key takeaways: • Learn about the newest features in the Syncsort Integrate product family • Get a sneak preview of interesting Integrate features coming soon • See the new intuitive independent DataFunnel™ platform interface

What’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding

Precisely

Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

Michael Rainey

Independent of the source of data, the integration and analysis of event streams gets more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. So far this mostly a development experience, with frameworks such as Oracle Event Processing, Apache Storm or Spark Streaming. With Oracle Stream Analytics, analytics on event streams can be put in the hands of the business analyst. It simplifies the implementation of event processing solutions so that every business analyst is able to graphically and decleratively define event stream processing pipelines, without having to write a single line of code or continous query language (CQL). Event Processing is no longer “complex”! This session presents Oracle Stream Analytics directly on some selected demo use cases.

Oracle Stream Analytics - Simplifying Stream Processing

Guido Schmutz

他の代表的なエンタープライズブロックチェーン基盤と比較したときのHyperledger Fabricの強みと課題、そしてユースケースの向き不向きについてお話します。また、しばしば難しいと言われるFabricをこれから入門する方ができるだけわかりやすく学んでいく、あるいはそうした方に教えていくためのポイントについてもご紹介します。講演者 : 日本オラクル株式会社クラウドプラットフォームソリューション部中村岳氏 2020年6月11日オンライン開催 Hyperleger Tokyo Meetup にて講演

エンタープライズブロックチェーン基盤のひとつとしてのHyperledger Fabricの強みと課題

Hyperleger Tokyo Meetup

"Puppet at Scale – Case Study of PayPal's Learnings" by Stan Hsu, Senior Dev Manager, PayPal. Presentation Overview: Large scale and app level management pose challenges to any implementation of puppet. Come and learn some of the challenges PayPal Deployment Systems team faced and the how these were overcome. Speaker Bio: Stan Hsu is the Senior Dev Manager for PayPal's deployment systems team. His team is currently responsible to build out a new deployment system based on puppet. In his tenure at eBay/PayPal, he's had the unique experience of having had access to all data centers in both eBay and PayPal to help build out of new deployment systems for production and QA environments. His interests include application at scale, scalability, performance tuning, and usability. In his previous roles he has managed teams at Tibco, Crossworlds, and HP.

Puppet at Scale – Case Study of PayPal's Learnings - PuppetConf 2013

Puppet

Similaire à How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors (20)

Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming

Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...

[Public] 7 archetipi della tecnologia moderna [italy]

Off-Label Data Mesh: A Prescription for Healthier Data

Accelerating Data Lakes and Streams with Real-time Analytics

What's New in Upcoming Apache Spark 2.3

[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data

A Journey into Databricks' Pipelines: Journey and Lessons Learned

2018 02-08-what's-new-in-apache-spark-2.3

Presto @ Zalando - Big Data Tech Warsaw 2020

Convergent Replicated Data Types in Riak 2.0

Avoiding Common Pitfalls: Spark Structured Streaming with Kafka

Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine

Inside Of Mbga Open Platform

What’s New in Syncsort Integrate? New User Experience for Fast Data Onboarding

Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

Oracle Stream Analytics - Simplifying Stream Processing

エンタープライズブロックチェーン基盤のひとつとしてのHyperledger Fabricの強みと課題

Puppet at Scale – Case Study of PayPal's Learnings - PuppetConf 2013

Plus de Lightbend

********** Watch this presentation on-demand! https://info.lightbend.com/iot-megaservices-high-throughput-microservices-with-akka-register.html ********** In this interactive presentation by Hugh McKee, Developer Advocate at Lightbend, we’ll share our experiences helping our clients create a system architecture that can support high throughput microservices (aka "Megaservices"). We’ll do that using IoT demo applications designed to push cloud service providers like Amazon and Google to their limits. Using sample code that you can later run on your own machine, we’ll look at: * Modeling real-life digital twins for hundreds of thousands of IoT devices in the field, looking into how these megaservices are implemented in Akka. * Visualizing Akka Actors–which represent IoT digital twins–in a “crop circle” formation that represents a complete distributed Reactive application, and watching at messages are processed across Akka Cluster nodes using cluster sharding. * Some code behind the whole set up, which is built using OSS like Akka, Java, JavaScript, and Kubernetes. Follow us on social: TW: https://twitter.com/lightbend LI: https://www.linkedin.com/company/lightbend-inc-/ FB: https://www.facebook.com/lightbendOfficial/ For more about Lightbend: Blog: https://www.lightbend.com/blog  Newsletter: https://www.lightbend.com/newsletter 

IoT 'Megaservices' - High Throughput Microservices with Akka

Lightbend

How Akka Cluster Works: Actors Living in a Cluster

Lightbend

The Reactive Principles: Eight Tenets For Building Cloud Native Applications

Lightbend

Putting the 'I' in IoT - Building Digital Twins with Akka Microservices

Lightbend

Digital Transformation with Kubernetes, Containers, and Microservices

Lightbend

Cloudstate - Towards Stateful Serverless

Lightbend

Join this highly-visual presentation by Hugh McKee, Developer Advocate at Lightbend, to learn more about the ramifications and opportunities along the evolution from monolithic systems, to microservices architectures, to serverless (FaaS). See the video presentation on the Lightbend blog at: https://www.lightbend.com/blog/digital-transformation-from-monoliths-to-microservices-to-serverless-and-beyond

Digital Transformation from Monoliths to Microservices to Serverless and Beyond

Lightbend

Akka Anti-Patterns, Goodbye: Six Features of Akka 2.6

Lightbend

Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...

Lightbend

How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...

Lightbend

In this talk by David Ogren, Enterprise Architect at Lightbend, we draw from experiences helping our clients successfully create, migrate to, and manage cloud-native system architectures. We look at some of the common pitfalls and anti-patterns of modernization efforts, and some of the best practices for taking an incremental approach to transforming legacy systems. See the full post with video on the Lightbend blog: https://www.lightbend.com/blog/microservices-kubernetes-application-modernization

Microservices, Kubernetes, and Application Modernization Done Right

Lightbend

Full Stack Reactive In Practice

Lightbend

Akka and Kubernetes: A Symbiotic Love Story

Lightbend

Scala 3 Is Coming: Martin Odersky Shares What To Know

Lightbend

A lot of businesses that never before considered themselves as “technology companies” are now faced with digital modernization imperatives that force them to rethink their application and infrastructure architecture. On the path to becoming a digital, on-demand provider, development speed is the ultimate competitive advantage. This presents challenges to many organizations that have huge investments in legacy Java EE infrastructure, where technical debt and monolithic system architectures require modernization in order to confront various business risks. Usually, changes need to be made within existing frameworks to keep pace with new web-scale organizations. If your legacy monolith is no longer serving the expanding needs of your business, then join Markus Eisele, Director of Developer Advocacy at Lightbend, to learn what you can do to migrate from Java EE to cloud-native, Reactive systems—as defined by the Reactive Manifesto.

Migrating From Java EE To Cloud-Native Reactive Systems

Lightbend

In this talk by Sean Glover, Principal Engineer at Lightbend, we will review how the Strimzi Kafka Operator, a supported technology in Lightbend Platform, makes many operational tasks in Kafka easy, such as the initial deployment and updates of a Kafka and ZooKeeper cluster. See the blog post containing the YouTube video here: https://www.lightbend.com/blog/running-kafka-on-kubernetes-with-strimzi-for-real-time-streaming-applications

Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications

Lightbend

Designing Events-First Microservices For A Cloud Native World

Lightbend

Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala

Lightbend

How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes

Lightbend

A Glimpse At The Future Of Apache Spark 3.0 With Deep Learning And Kubernetes

Lightbend

Plus de Lightbend (20)

IoT 'Megaservices' - High Throughput Microservices with Akka

How Akka Cluster Works: Actors Living in a Cluster

The Reactive Principles: Eight Tenets For Building Cloud Native Applications

Putting the 'I' in IoT - Building Digital Twins with Akka Microservices

Digital Transformation with Kubernetes, Containers, and Microservices

Cloudstate - Towards Stateful Serverless

Digital Transformation from Monoliths to Microservices to Serverless and Beyond

Akka Anti-Patterns, Goodbye: Six Features of Akka 2.6

Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...

How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...

Microservices, Kubernetes, and Application Modernization Done Right

Full Stack Reactive In Practice

Akka and Kubernetes: A Symbiotic Love Story

Scala 3 Is Coming: Martin Odersky Shares What To Know

Migrating From Java EE To Cloud-Native Reactive Systems

Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications

Designing Events-First Microservices For A Cloud Native World

Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala

How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes

A Glimpse At The Future Of Apache Spark 3.0 With Deep Learning And Kubernetes

Dernier

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Friends Colony Women Seeking Men

Delhi Call girls

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

08448380779 Call Girls In Civil Lines Women Seeking Men

Delhi Call girls

Presentation on how to chat with PDF using ChatGPT code interpreter

naman860154

How to convert PDF to text with Nanonets

naman860154

Tech Trends Report 2024 Future Today Institute.pdf

hans926745

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

🐬 The future of MySQL is Postgres 🐘

RTylerCroy

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

Heather Hedden, Senior Consultant at Enterprise Knowledge, presented “The Role of Taxonomy and Ontology in Semantic Layers” at a webinar hosted by Progress Semaphore on April 16, 2024. Taxonomies at their core enable effective tagging and retrieval of content, and combined with ontologies they extend to the management and understanding of related data. There are even greater benefits of taxonomies and ontologies to enhance your enterprise information architecture when applying them to a semantic layer. A survey by DBP-Institute found that enterprises using a semantic layer see their business outcomes improve by four times, while reducing their data and analytics costs. Extending taxonomies to a semantic layer can be a game-changing solution, allowing you to connect information silos, alleviate knowledge gaps, and derive new insights. Hedden, who specializes in taxonomy design and implementation, presented how the value of taxonomies shouldn’t reside in silos but be integrated with ontologies into a semantic layer. Learn about: - The essence and purpose of taxonomies and ontologies in information and knowledge management; - Advantages of semantic layers leveraging organizational taxonomies; and - Components and approaches to creating a semantic layer, including the integration of taxonomies and ontologies

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Enterprise Knowledge

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

Histor y of HAM Radio presentation slide

vu2urc

In an era where artificial intelligence (AI) stands at the forefront of business innovation, Information Architecture (IA) is at the core of functionality. See “There’s No AI Without IA” – (from 2016 but even more relevant today) Understanding and leveraging how Information Architecture (IA) supports AI synergies between knowledge engineering and prompt engineering is critical for senior leaders looking to successfully deploy AI for internal and externally facing knowledge processes. This webinar be a high-level overview of the methodologies that can elevate AI-driven knowledge processes supporting both employees and customers. Core Insights Include: Strategic Knowledge Engineering: Delve into how structuring AI's knowledge base is required to prevent hallucinations, enable contextual retrieval of accurate information. This will include discussion of gold standard libraries of use cases support testing various LLMs and structures and configurations of knowledge base. Precision in Prompt Engineering: Learn the art of crafting prompts that direct AI to deliver targeted, relevant responses, thereby optimizing customer experiences and business outcomes. Unified Approach for Enhanced AI Performance: Explore the intersection of knowledge and prompt engineering to develop AI systems that are not only more responsive but also aligned with overarching business strategies. Guiding Principles for Implementation: Equip yourself with best practices, ethical guidelines, and strategic considerations for embedding these technologies into your business ecosystem effectively. This webinar is designed to empower business and technology leaders with the knowledge to harness the full potential of AI, ensuring their organizations not only keep pace with digital transformation but lead the charge. Join us to map a roadmap to fully leverage Information Architecture (IA) and AI chart a course towards a future where AI is a key pillar of strategic innovation and business success.

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Earley Information Science

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

[2024]Digital Global Overview Report 2024 Meltwater.pdf

hans926745

Enterprise Knowledge’s Urmi Majumder, Principal Data Architecture Consultant, and Fernando Aguilar Islas, Senior Data Science Consultant, presented "Driving Behavioral Change for Information Management through Data-Driven Green Strategy" on March 27, 2024 at Enterprise Data World (EDW) in Orlando, Florida. In this presentation, Urmi and Fernando discussed a case study describing how the information management division in a large supply chain organization drove user behavior change through awareness of the carbon footprint of their duplicated and near-duplicated content, identified via advanced data analytics. Check out their presentation to gain valuable perspectives on utilizing data-driven strategies to influence positive behavioral shifts and support sustainability initiatives within your organization. In this session, participants gained answers to the following questions: - What is a Green Information Management (IM) Strategy, and why should you have one? - How can Artificial Intelligence (AI) and Machine Learning (ML) support your Green IM Strategy through content deduplication? - How can an organization use insights into their data to influence employee behavior for IM? - How can you reap additional benefits from content reduction that go beyond Green IM?

Driving Behavioral Change for Information Management through Data-Driven Gree...

Enterprise Knowledge

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024

08448380779 Call Girls In Friends Colony Women Seeking Men

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

08448380779 Call Girls In Civil Lines Women Seeking Men

Presentation on how to chat with PDF using ChatGPT code interpreter

How to convert PDF to text with Nanonets

Tech Trends Report 2024 Future Today Institute.pdf

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

🐬 The future of MySQL is Postgres 🐘

Automating Google Workspace (GWS) & more with Apps Script

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Strategies for Landing an Oracle DBA Job as a Fresher

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Data Cloud, More than a CDP by Matt Robison

Histor y of HAM Radio presentation slide

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Driving Behavioral Change for Information Management through Data-Driven Gree...

How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors

1. 1 Proprietary & Confidential1 Proprietary & Confidential Using Akka Streams For Real Time Decision Making Dustin Lyons Engineering Manager, Data Platform

2. 2 Proprietary & Confidential ● Engineer turned Engineering Manager at Credit Karma ● Data & Analytics on the Platform team ● Build things that make decisions on where data should go ● Lover of science fiction, sushi, and electronic music Who I am

3. 3 Proprietary & Confidential Credit Karma is a free financial assistant, helping over 60 million people make progress.

4. 4 Proprietary & Confidential 1. Data Infrastructure at Credit Karma: Past and current 2. Mo’ data, mo’ problems 3. Akka Streams saves the day 4. Results and learnings 5. Q&A Agenda for today

5. 5 Proprietary & Confidential Data scale (MB/min) @ Credit Karma

6. 6 Proprietary & Confidential Credit Karma data platform: PHP days PHP Scripts

7. 7 Proprietary & Confidential New tools to help with scale

8. 8 Proprietary & Confidential Credit Karma data platform: Scala in 2014 Data Warehouse Import

9. 9 Proprietary & Confidential New tools to help with concurrency

10. 10 Proprietary & Confidential Credit Karma data platform: Akka in 2015 Analytics Export Service + Data Warehouse Import

11. 11 Proprietary & Confidential Credit Karma data platform: Akka in 2015 Analytics Export Service + Data Warehouse Import

12. 12 Proprietary & Confidential Analytics export service Coordinator Data Transformer Workers Kafka Importer Workers Analytics Export Service HTTP Ingest Server

13. 13 Proprietary & Confidential Analytics export service

14. 14 Proprietary & Confidential Analytics export service Coordinator Data Transformer Workers Kafka Importer Workers Analytics Export Service HTTP Ingest Server

15. 15 Proprietary & Confidential Analytics export service

16. 16 Proprietary & Confidential Data warehouse import ReaderDeduplicatorProcessor Extractors Data Warehouse Import Service

17. 17 Proprietary & Confidential Data warehouse import

18. 18 Proprietary & Confidential Marble maze

19. 19 Proprietary & Confidential Marble maze

20. 20 Proprietary & Confidential Marble maze

21. 21 Proprietary & Confidential Marble maze

22. 22 Proprietary & Confidential Marble maze 1Reading from file

23. 23 Proprietary & Confidential Marble maze 1 2 Reading from file Waiting for external service

24. 24 Proprietary & Confidential Marble maze 1 3 2 Reading from file Objects sit in heap Waiting for external service

25. 25 Proprietary & Confidential Marble maze 1 3 2 Reading from file Objects sit in heap Waiting for external service 4 Database Insert

26. 26 Proprietary & Confidential Backpressure

27. 27 Proprietary & Confidential What is backpressure? Backpressure refers to the buildup of data at an I/O switch when buffers are full and not able to receive additional data. No additional data packets are transferred until the bottleneck of data has been eliminated or the buffer has been emptied.

28. 28 Proprietary & Confidential Analytics export service Coordinator Data Transformer Workers Kafka Importer Workers Analytics Export Service HTTP Ingest Server

29. 29 Proprietary & Confidential Analytics export service Coordinator Data Transformer Workers Kafka Importer Workers Analytics Export Service HTTP Ingest Server

30. 30 Proprietary & Confidential Analytics export service Coordinator Data Transformer Workers Kafka Importer Workers Analytics Export Service HTTP Ingest Server

31. 31 Proprietary & Confidential Data warehouse import ReaderDeduplicatorProcessor Extractors Data Warehouse Import Service

32. 32 Proprietary & Confidential Akka Streams: Backpressure in action Actor Actor Data Demand

33. 33 Proprietary & Confidential Akka Streams: Creating a stream Source Flow Sink

34. 34 Proprietary & Confidential Akka Streams: Built in stages Built In Sources • actorRef • actorPublisher • fromIterator • fromFile • Apply (from a Seq) Built In Processing Stages • map • filter • grouped • drop/take • dropWhile/takeWhile • sliding Built In Sinks • head • last • seq • foreach • actorRef • actorSubscriber • reduce • fold Backpressure Aware Stages • mapAsync • buffer (Backpressure) • batch • buffer (Drop) • buffer (Fail) Reference: http://doc.akka.io/docs/akka/current/scala/stream/stages-overview.html

35. 35 Proprietary & Confidential Analytics export service Coordinator Data Transformer Workers Kafka Importer Workers Analytics Export Service HTTP Ingest Server

36. 36 Proprietary & Confidential Analytics export service Coordinator Analytics Export Service HTTP Ingest ServerAkka Stream

37. 37 Proprietary & Confidential Analytics export service

38. 38 Proprietary & Confidential Data warehouse import ReaderDeduplicatorProcessor Extractors Data Warehouse Import Service

39. 39 Proprietary & Confidential Data warehouse import Extractors Data Warehouse Import Service Akka Stream

40. 40 Proprietary & Confidential Data warehouse import service

41. 41 Proprietary & Confidential Analytics export service heap (before) GiB=> Time => 28 GiB Red: Heap Space Blue: Used Heap Space Purple: Max Heap Space

42. 42 Proprietary & Confidential Analytics export service heap (after) GiB=> Time => 28 GiB

43. 43 Proprietary & Confidential Data warehouse import

44. 44 Proprietary & Confidential Data warehouse import

45. 45 Proprietary & Confidential Data warehouse import

46. 46 Proprietary & Confidential • Akka Streams allowed us to move data with increased throughput and optimal performance • No longer getting paged for JVM out of memory or spending time tuning our services • Reduced the SLA for data delivery to our business stakeholders Final results

47. 47 Proprietary & Confidential • Akka Actors: Great for low latency • Akka Streams: Optimized for high throughput and solving back pressure • Built on top of Akka Actors • Don’t try to build high throughput systems with an actor system, you’ll just start building Akka Streams Lessons learned

48. 48 Proprietary & Confidential48 Proprietary & Confidential Thank you! Q&A Dustin Lyons Engineering Manager, Data Platform

How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors

Similaire à How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors (20)

Plus de Lightbend

Plus de Lightbend (20)

Dernier

Dernier (20)

How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Streams And Actors