SlideShare une entreprise Scribd logo
1  sur  84
Streaming Reactive Systems & Data
Pipes
InfoQ.com: News & Community Site
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
squbs
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon London
www.qconlondon.com
Once upon a time…
• There was (is still) a need to make JVM
programming more
• efficient
• reliable
• simple
• It was (is still) hard to write concurrent
programs
• Mix & match of libs made (still makes) things
worse and unsupportable
Thinking in Streams Streams of events  services
Streams of events  services
Shall we not just describe processing &
transformations?
Why still thinking locks/synchronizations?
The universe can be thought of as
streams of events
Core Concepts
Linear Stages: source, flow, sink
Source Sink Flow
Core Concepts
Linear Stages: source, flow, sink
Fan In/Out: multiple in/out ports
Source Sink Flow
Fan-In Fan-Out
Core Concepts
Linear Stages: source, flow, sink
Fan In/Out: multiple in/out ports
BidiFlow: easy to stack on each other
Source Sink Flow
Fan-In Fan-Out BidiFlow
Core Concepts
Materializer: creates a network of running
entities
Source SinkFlow
Core Concepts
Materializer: creates a network of running
entities
Materialized Values: Results
Source SinkFlow
Core Concepts
Materializer: creates a network of running
entities
Materialized Values: Results
Back-Pressure:
Keeps streams resilient under load
Source SinkFlow
Composition &
Componentization
Stream declarations are immutable
templates
Can be produced from a function
Can be materialized multiple times
Parts are immutable, composable
Composition of Stream Components
Source Flow Flow
Composition of Stream Components
Composite Source
Composition of Stream Components
Composite Source
Flow Flow
Composition of Stream Components
Composite Source Composite Flow
Composition of Stream Components
Composite Source Composite Flow
Flow Sink
Composition of Stream Components
Composite Source Composite Flow Composite Sink
Composition of Stream Components
Composite Source
Composition of Stream Components
Composite Source Composite Flow
Composition of Stream Components
Composite Source Composite Flow Composite Sink
(nested)
Sample 1: Big Data Collectors/Enrichers
Req
Resp
Extract
Respond
MergeHub
Req
Resp
Extract
Respond
Enrich
Flow
Kafka
Sink
Enrichment Flow
HTTP Flows
Sample 1: Big Data Collectors/Enrichers
Req
Resp
Extract
Respond
MergeHub
Req
Resp
Extract
Respond
Enrich
Flow
Kafka
Sink
Persistent
Buffer
Deals with Kafka
Rebalancing/Unavailability
Enrichment Flow
HTTP Flows
Sample 1: Big Data Collectors/Enrichers
// PerpetualStream: Enrichment Flow
def streamGraph = MergeHub.source[Beacon]
.via(enrichFlow)
.via(PersistentBuffer[EnrichedBeacon](new File("/var/tmp/pb")))
.map(bcn => new ProducerRecord[Array[Byte], EnrichedBeacon]("beacons", bcn))
.toMat(Producer.plainSink(settings))(Keep.both)
Sample 1: Big Data Collectors/Enrichers
// PerpetualStream: Enrichment Flow
def streamGraph = MergeHub.source[Beacon]
.via(enrichFlow)
.via(PersistentBuffer[EnrichedBeacon](new File("/var/tmp/pb")))
.map(bcn => new ProducerRecord[Array[Byte], EnrichedBeacon]("beacons", bcn))
.toMat(Producer.plainSink(settings))(Keep.both)
// FlowDefinition: HTTP Flow
val (enrichStream, _) = matValue("/user/enrich/enrichstream")
def flow = Flow[HttpRequest]
.mapAsync(1)(Unmarshal(_).to[Beacon])
.alsoTo(enrichStream)
.map(beacon => HttpResponse(entity = s"Received Id: ${beacon.id}”))
Sample 1: Big Data Collectors/Enrichers
// PerpetualStream: Enrichment Flow
def streamGraph = MergeHub.source[Beacon]
.via(enrichFlow)
.via(PersistentBuffer[EnrichedBeacon](new File("/var/tmp/pb")))
.map(bcn => new ProducerRecord[Array[Byte], EnrichedBeacon]("beacons", bcn))
.toMat(Producer.plainSink(settings))(Keep.both)
// FlowDefinition: HTTP Flow
val (enrichStream, _) = matValue("/user/enrich/enrichstream")
def flow = Flow[HttpRequest]
.mapAsync(1)(Unmarshal(_).to[Beacon])
.alsoTo(enrichStream)
.map(beacon => HttpResponse(entity = s"Received Id: ${beacon.id}”))
Process
Item
Sample 2: Micro-batching Event Pre-processor
Item
Event
Source
User
Flow Sink
Fee
Flow
Processo
r
Bulk
Bulk
Fee APIUser API
Sample2: Micro-batching Event Pre-processor
CompletionStage<Optional<PreprocessEnvelope>> preprocessing =
Source.fromIterator(() -> new PreprocessItemSource(prop, seedEnvelope, system()))
.via(userFlow)
.via(feeFlow)
.via(completeProcessing)
.runWith(Sink.lastOption(), materializer);
CompletionStage<Optional<PreprocessEnvelope>> preprocessing =
Source.fromIterator(() -> new PreprocessItemSource(prop, seedEnvelope, system()))
.via(userFlow)
.via(feeFlow)
.via(completeProcessing)
.runWith(Sink.lastOption(), materializer);
Sample2: Micro-batching Event Pre-processor
Flow.of(PreprocessEnvelope.class)
.grouped(config.getAsInt("user.group-size", 50))
.map(accountLoad::populateAccountRequest)
.via(retry.join(userClientFlow))
.map(accountLoad::processAccountResponse)
.mapConcat(o -> o);
CompletionStage<Optional<PreprocessEnvelope>> preprocessing =
Source.fromIterator(() -> new PreprocessItemSource(prop, seedEnvelope, system()))
.via(userFlow)
.via(feeFlow)
.via(completeProcessing)
.runWith(Sink.lastOption(), materializer);
Sample2: Micro-batching Event Pre-processor
Flow.of(PreprocessEnvelope.class)
.groupBy(195, it -> it.userInfo.country)
.groupedWithin(config.getAsInt("fee.group-size", 50),
Duration.create(1, TimeUnit.SECONDS))
…
.map(this::populatePricingRequest)
.via(pricingClientFlow)
.map(this::processResponse)
.mapConcat(o -> o);
A More Complex Stream
Req
Resp
Accept
Respond
MergeHub
Kafka
Source
Partition
Metadata
Client
Merge
Store 1
Store 2
Data
Sink
Data
Sink
Stream Programming Reliability
• Less need for reliability
• 1% message loss acceptable
Analytics
• Absolute reliability
• Allowed to lose your payment?
Transactional
Traditional expectations: Not very reliable
Stream Reliability
& Resilience
Stream Reliability
& Resilience
Streams back-pressure
Back-pressure
Main Flow
Back-pressure
Upstream DownstreamMain Flow
No demand
Demand
Back-pressure
Upstream DownstreamMain Flow
pull
No demand
Demand
Back-pressure
Upstream DownstreamMain Flow
pull
No demand
Demand
Back-pressure
Upstream DownstreamMain Flow
pull
No demand
Demand
Back-pressure
Upstream DownstreamMain Flow
pull
No demand
Demand
Back-pressure
Upstream DownstreamMain Flow
pull
No demand
Demand
Back-pressure
Upstream DownstreamMain Flow
No demand
Demand
Back-pressure
Upstream DownstreamMain Flow
push
No demand
Demand
Back-pressure
Upstream DownstreamMain Flow
push
No demand
Demand
Back-pressure
Upstream DownstreamMain Flow
push
No demand
Demand
Back-pressure
Upstream DownstreamMain Flow
push
No demand
Demand
Back-pressure
Upstream DownstreamMain Flow
push
No demand
Demand
Stream Reliability
& Resilience
Streams back-pressure
Gate high-risk regions
With timeout, retry, or circuit-breaker
Back-pressure
Main Flow
Gating High Risk Regions
Main Flow
High Risk Flow
Gating High Risk Regions
Main Flow
High Risk Flow
Retry Gate
Gating High Risk Regions
Main Flow
High Risk Flow
val settings = Settings(max = 2).withDelay(1 second)
val gatedFlow = Retry(settings).join(highRiskFlow)
upstream.via(gatedFlow).via(downstream)Retry Gate
Dissecting the Retry Stage
in1 out1
in2out2
High Risk Flow
Retry Gate
Main Flow
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
pull
Happy Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
pull
Happy Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
pull
Happy Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
pull
Happy Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
push
Happy Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
push
Happy Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
push
Happy Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
push
Happy Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
push
Error Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
Error Case
push High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
Error Case
push
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
Error Case
pull
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
Error Case
pull High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
Timer Retry Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
Timer Retry Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
Pull Retry Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
Pull Retry Case
pull High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
Pull Retry Case
High Risk Flow
Retry Gate
No demand
Demand
Dissecting the Retry Stage
in1 out1
in2out2
Retry Queue Full
pull High Risk Flow
Retry Gate
No demand
Demand
Gating High Risk Regions
Main Flow
High-Risk Flow
☑
Back-Pressure
PropagatedRetry Gate
Learnings from
Retry Stage
Be very careful “when” you pull
Timers are only useful when there is
demand
No demand? Use onPull to check for retries
These components are hard to get right
Retry Gate
https://github.com/paypal/squbs/blob/master/squbs-ext/src/main/scala/org/squbs/streams/Retry.scala
Stream Reliability
& Resilience
Streams back-pressure
Gate high-risk regions
With timeout, retry, or circuit-breaker
Reassembling streams order
Gating High Risk Regions
Main Flow
High-Risk Flow
Retry Gate
Gating High Risk Regions + Re-ordering
Main Flow
High-Risk Flow
Retry Gate
Stream Reliability
& Resilience
Streams back-pressure
Gate high-risk regions
With timeout, retry, or circuit-breaker
Reassembling streams order
Clean lifecycle
Prevents message loss at node shutdown
Benefits of Stream-Based Services
Simpler and smaller code
Highly composable
More reliable and resilient
More efficient
✔
✔
✔
✔
“We saw an 80% reduction in processing time with the new stack and a near 0% failure rate.”
– Balaji Srinivasaraghavan
More Efficient, Less Errors
August 2017 – before squbs/Akka Streams October 2017 – with squbs/Akka Streams
https://www.paypal-engineering.com/learnings-from-using-a-reactive-platform-akkasqubs
Takeaways The problem may be somewhere else
Always consume HTTP responses
Have Monitoring
Never block
Use Akka Stream Testkit
In Conclusion,
we hope you…
Know how to think in streams for developing
services
Feel that stream-based development can solve a
large part of complexity, resiliency, and efficiency
problems
Go try out building stream-based systems
http://clipground.com
Q&A – Feedback Appreciated
•
•
•
•
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
squbs

Contenu connexe

Plus de C4Media

Plus de C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Streaming Reactive Systems & Data Pipes w. Squbs

  • 2. InfoQ.com: News & Community Site Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ squbs • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon London www.qconlondon.com
  • 4. Once upon a time… • There was (is still) a need to make JVM programming more • efficient • reliable • simple • It was (is still) hard to write concurrent programs • Mix & match of libs made (still makes) things worse and unsupportable
  • 5. Thinking in Streams Streams of events  services Streams of events  services Shall we not just describe processing & transformations? Why still thinking locks/synchronizations? The universe can be thought of as streams of events
  • 6. Core Concepts Linear Stages: source, flow, sink Source Sink Flow
  • 7. Core Concepts Linear Stages: source, flow, sink Fan In/Out: multiple in/out ports Source Sink Flow Fan-In Fan-Out
  • 8. Core Concepts Linear Stages: source, flow, sink Fan In/Out: multiple in/out ports BidiFlow: easy to stack on each other Source Sink Flow Fan-In Fan-Out BidiFlow
  • 9. Core Concepts Materializer: creates a network of running entities Source SinkFlow
  • 10. Core Concepts Materializer: creates a network of running entities Materialized Values: Results Source SinkFlow
  • 11. Core Concepts Materializer: creates a network of running entities Materialized Values: Results Back-Pressure: Keeps streams resilient under load Source SinkFlow
  • 12. Composition & Componentization Stream declarations are immutable templates Can be produced from a function Can be materialized multiple times Parts are immutable, composable
  • 13. Composition of Stream Components Source Flow Flow
  • 14. Composition of Stream Components Composite Source
  • 15. Composition of Stream Components Composite Source Flow Flow
  • 16. Composition of Stream Components Composite Source Composite Flow
  • 17. Composition of Stream Components Composite Source Composite Flow Flow Sink
  • 18. Composition of Stream Components Composite Source Composite Flow Composite Sink
  • 19. Composition of Stream Components Composite Source
  • 20. Composition of Stream Components Composite Source Composite Flow
  • 21. Composition of Stream Components Composite Source Composite Flow Composite Sink (nested)
  • 22. Sample 1: Big Data Collectors/Enrichers Req Resp Extract Respond MergeHub Req Resp Extract Respond Enrich Flow Kafka Sink Enrichment Flow HTTP Flows
  • 23. Sample 1: Big Data Collectors/Enrichers Req Resp Extract Respond MergeHub Req Resp Extract Respond Enrich Flow Kafka Sink Persistent Buffer Deals with Kafka Rebalancing/Unavailability Enrichment Flow HTTP Flows
  • 24. Sample 1: Big Data Collectors/Enrichers // PerpetualStream: Enrichment Flow def streamGraph = MergeHub.source[Beacon] .via(enrichFlow) .via(PersistentBuffer[EnrichedBeacon](new File("/var/tmp/pb"))) .map(bcn => new ProducerRecord[Array[Byte], EnrichedBeacon]("beacons", bcn)) .toMat(Producer.plainSink(settings))(Keep.both)
  • 25. Sample 1: Big Data Collectors/Enrichers // PerpetualStream: Enrichment Flow def streamGraph = MergeHub.source[Beacon] .via(enrichFlow) .via(PersistentBuffer[EnrichedBeacon](new File("/var/tmp/pb"))) .map(bcn => new ProducerRecord[Array[Byte], EnrichedBeacon]("beacons", bcn)) .toMat(Producer.plainSink(settings))(Keep.both) // FlowDefinition: HTTP Flow val (enrichStream, _) = matValue("/user/enrich/enrichstream") def flow = Flow[HttpRequest] .mapAsync(1)(Unmarshal(_).to[Beacon]) .alsoTo(enrichStream) .map(beacon => HttpResponse(entity = s"Received Id: ${beacon.id}”))
  • 26. Sample 1: Big Data Collectors/Enrichers // PerpetualStream: Enrichment Flow def streamGraph = MergeHub.source[Beacon] .via(enrichFlow) .via(PersistentBuffer[EnrichedBeacon](new File("/var/tmp/pb"))) .map(bcn => new ProducerRecord[Array[Byte], EnrichedBeacon]("beacons", bcn)) .toMat(Producer.plainSink(settings))(Keep.both) // FlowDefinition: HTTP Flow val (enrichStream, _) = matValue("/user/enrich/enrichstream") def flow = Flow[HttpRequest] .mapAsync(1)(Unmarshal(_).to[Beacon]) .alsoTo(enrichStream) .map(beacon => HttpResponse(entity = s"Received Id: ${beacon.id}”))
  • 27. Process Item Sample 2: Micro-batching Event Pre-processor Item Event Source User Flow Sink Fee Flow Processo r Bulk Bulk Fee APIUser API
  • 28. Sample2: Micro-batching Event Pre-processor CompletionStage<Optional<PreprocessEnvelope>> preprocessing = Source.fromIterator(() -> new PreprocessItemSource(prop, seedEnvelope, system())) .via(userFlow) .via(feeFlow) .via(completeProcessing) .runWith(Sink.lastOption(), materializer);
  • 29. CompletionStage<Optional<PreprocessEnvelope>> preprocessing = Source.fromIterator(() -> new PreprocessItemSource(prop, seedEnvelope, system())) .via(userFlow) .via(feeFlow) .via(completeProcessing) .runWith(Sink.lastOption(), materializer); Sample2: Micro-batching Event Pre-processor Flow.of(PreprocessEnvelope.class) .grouped(config.getAsInt("user.group-size", 50)) .map(accountLoad::populateAccountRequest) .via(retry.join(userClientFlow)) .map(accountLoad::processAccountResponse) .mapConcat(o -> o);
  • 30. CompletionStage<Optional<PreprocessEnvelope>> preprocessing = Source.fromIterator(() -> new PreprocessItemSource(prop, seedEnvelope, system())) .via(userFlow) .via(feeFlow) .via(completeProcessing) .runWith(Sink.lastOption(), materializer); Sample2: Micro-batching Event Pre-processor Flow.of(PreprocessEnvelope.class) .groupBy(195, it -> it.userInfo.country) .groupedWithin(config.getAsInt("fee.group-size", 50), Duration.create(1, TimeUnit.SECONDS)) … .map(this::populatePricingRequest) .via(pricingClientFlow) .map(this::processResponse) .mapConcat(o -> o);
  • 31. A More Complex Stream Req Resp Accept Respond MergeHub Kafka Source Partition Metadata Client Merge Store 1 Store 2 Data Sink Data Sink
  • 32. Stream Programming Reliability • Less need for reliability • 1% message loss acceptable Analytics • Absolute reliability • Allowed to lose your payment? Transactional Traditional expectations: Not very reliable
  • 48. Stream Reliability & Resilience Streams back-pressure Gate high-risk regions With timeout, retry, or circuit-breaker
  • 50. Gating High Risk Regions Main Flow High Risk Flow
  • 51. Gating High Risk Regions Main Flow High Risk Flow Retry Gate
  • 52. Gating High Risk Regions Main Flow High Risk Flow val settings = Settings(max = 2).withDelay(1 second) val gatedFlow = Retry(settings).join(highRiskFlow) upstream.via(gatedFlow).via(downstream)Retry Gate
  • 53. Dissecting the Retry Stage in1 out1 in2out2 High Risk Flow Retry Gate Main Flow No demand Demand
  • 54. Dissecting the Retry Stage in1 out1 in2out2 pull Happy Case High Risk Flow Retry Gate No demand Demand
  • 55. Dissecting the Retry Stage in1 out1 in2out2 pull Happy Case High Risk Flow Retry Gate No demand Demand
  • 56. Dissecting the Retry Stage in1 out1 in2out2 pull Happy Case High Risk Flow Retry Gate No demand Demand
  • 57. Dissecting the Retry Stage in1 out1 in2out2 pull Happy Case High Risk Flow Retry Gate No demand Demand
  • 58. Dissecting the Retry Stage in1 out1 in2out2 push Happy Case High Risk Flow Retry Gate No demand Demand
  • 59. Dissecting the Retry Stage in1 out1 in2out2 push Happy Case High Risk Flow Retry Gate No demand Demand
  • 60. Dissecting the Retry Stage in1 out1 in2out2 push Happy Case High Risk Flow Retry Gate No demand Demand
  • 61. Dissecting the Retry Stage in1 out1 in2out2 push Happy Case High Risk Flow Retry Gate No demand Demand
  • 62. Dissecting the Retry Stage in1 out1 in2out2 push Error Case High Risk Flow Retry Gate No demand Demand
  • 63. Dissecting the Retry Stage in1 out1 in2out2 Error Case push High Risk Flow Retry Gate No demand Demand
  • 64. Dissecting the Retry Stage in1 out1 in2out2 Error Case push High Risk Flow Retry Gate No demand Demand
  • 65. Dissecting the Retry Stage in1 out1 in2out2 Error Case pull High Risk Flow Retry Gate No demand Demand
  • 66. Dissecting the Retry Stage in1 out1 in2out2 Error Case pull High Risk Flow Retry Gate No demand Demand
  • 67. Dissecting the Retry Stage in1 out1 in2out2 Timer Retry Case High Risk Flow Retry Gate No demand Demand
  • 68. Dissecting the Retry Stage in1 out1 in2out2 Timer Retry Case High Risk Flow Retry Gate No demand Demand
  • 69. Dissecting the Retry Stage in1 out1 in2out2 Pull Retry Case High Risk Flow Retry Gate No demand Demand
  • 70. Dissecting the Retry Stage in1 out1 in2out2 Pull Retry Case pull High Risk Flow Retry Gate No demand Demand
  • 71. Dissecting the Retry Stage in1 out1 in2out2 Pull Retry Case High Risk Flow Retry Gate No demand Demand
  • 72. Dissecting the Retry Stage in1 out1 in2out2 Retry Queue Full pull High Risk Flow Retry Gate No demand Demand
  • 73. Gating High Risk Regions Main Flow High-Risk Flow ☑ Back-Pressure PropagatedRetry Gate
  • 74. Learnings from Retry Stage Be very careful “when” you pull Timers are only useful when there is demand No demand? Use onPull to check for retries These components are hard to get right Retry Gate https://github.com/paypal/squbs/blob/master/squbs-ext/src/main/scala/org/squbs/streams/Retry.scala
  • 75. Stream Reliability & Resilience Streams back-pressure Gate high-risk regions With timeout, retry, or circuit-breaker Reassembling streams order
  • 76. Gating High Risk Regions Main Flow High-Risk Flow Retry Gate
  • 77. Gating High Risk Regions + Re-ordering Main Flow High-Risk Flow Retry Gate
  • 78. Stream Reliability & Resilience Streams back-pressure Gate high-risk regions With timeout, retry, or circuit-breaker Reassembling streams order Clean lifecycle Prevents message loss at node shutdown
  • 79. Benefits of Stream-Based Services Simpler and smaller code Highly composable More reliable and resilient More efficient ✔ ✔ ✔ ✔
  • 80. “We saw an 80% reduction in processing time with the new stack and a near 0% failure rate.” – Balaji Srinivasaraghavan More Efficient, Less Errors August 2017 – before squbs/Akka Streams October 2017 – with squbs/Akka Streams https://www.paypal-engineering.com/learnings-from-using-a-reactive-platform-akkasqubs
  • 81. Takeaways The problem may be somewhere else Always consume HTTP responses Have Monitoring Never block Use Akka Stream Testkit
  • 82. In Conclusion, we hope you… Know how to think in streams for developing services Feel that stream-based development can solve a large part of complexity, resiliency, and efficiency problems Go try out building stream-based systems http://clipground.com
  • 83. Q&A – Feedback Appreciated • • • •
  • 84. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ squbs