SlideShare une entreprise Scribd logo
1  sur  81
Télécharger pour lire hors ligne
1
The art of the event streaming application.
streams, stream processors and scale
Neil Avery,
Office of the CTO,
@avery_neil
22
33
Out of the tar pit, 2006
44
“We believe that the major
contributor to this complexity in
many systems is the handling of
state and the burden that this adds
when trying to analyse and reason
about the system.”
Out of the tar pit, 2006
55
We like ‘all the things’
66
77
What about Microservices?
88
What are microservices?
Microservices are a software development
technique - a variant of the service-oriented
architecture (SOA) architectural style that
structures an application as a collection of
loosely coupled services.
https://en.wikipedia.org/wiki/Microservices
99
structures an application as a collection of
loosely coupled services.
https://en.wikipedia.org/wiki/Microservices
this is new!
1010
So what went wrong?
1111
Too many cooks
12
Making changes is risky
13
Handling state is hard Cache?
Embedded?
Route to right instance?
14
Shifts in responsibility, redundancy
1515
What have we learned about microservices?
● Scaling is hard
● Handling state is hard
● Sharing, coordinating is hard
● Run a database in each microservice - is hard
1616
Microservices
We had it all wrong (again)
FIX: Make them asynchronous and use
streams of events
We didn’t really
know what they
were anyway
1717
What the big idea?
1818
Event driven architectures
aren’t new
..but the world has changed
19
New technology, requirements and expectations
2020
Events
FACT!
SOMETHING
HAPPENED!
21
An Event
records the fact that something happened
21
A good
was sold
An invoice
was issued
A payment
was made
A new customer
registered
Events
Why do you care?
Loose coupling, autonomy, evolvability, scalability, resilience, traceability, replayability
EVENT-FIRST CHANGES HOW YOU
THINK ABOUT WHAT YOU ARE BUILDING
...more importantly...
23
Store events in
..a stream..
24
Different types of event models
● Change Data Capture - CDC (database txn log)
● Time series (IoT, metrics)
● Microservices (domain events)
25
Capture behavior
26
Time travel user experience?
how many users
affected?has it happened
before?
Ask many questions of the same data, again and again
time
27
Evolvability
user experience?
how many users
affected?
has it happened
before?
new old
supports data change, logic change, logic extension, schema
evolution, loose coupling, add processors, A/B path
2828
OLD: event-driven architectures
NEW: event-streaming architectures
Organise events as
streams
Kafka is a database for events
Kafka cluster
stream
processing
Kafka Streams
KSQL
Producer/Consumer
{
user: 100
type: bid
item: 389
cat: bikes/mtb
region: dc-east
}
Partitions give you horizontal scale
/bikes/ by item-id
key#
Key
space
{...}
{...}
{...}
ConsumerTopic
Partition
Partition
assignment
Stream
processor
31
Stream processing
Kafka
Streams
processor
input events
output events
...temporal reasoning...
event-driven microservice
32
It’s pretty powerful
Stream
processor
Stream
processor
Stream
processor
Topic: click-stream
Interactive query
CDC events from KTable
CDC Stream
partition
partition
partition
CQRS
Elastic
Streaming stack
subscribe(), poll(), send(),
flush(), beginTransaction(), …
KStream, KTable, filter(),
map(), flatMap(), join(),
aggregate(), transform(), …
CREATE STREAM, CREATE TABLE,
SELECT, JOIN, GROUP BY, SUM, …
KSQL UDFs
Ease of Use
Flexibility
34
KSQL
SELECT u.country, count(u.name)
FROM user-reg u
WINDOW TUMBLING (SIZE 1 MIN)
GROUP BY u.country
35
Kafka Streams (DSL)
KStream<String, String> userBids = builder.stream("stream-user-bids");
final KTable<String, Long> bidCount = userBids
.flatMapValues(value ->
Arrays.asList(pattern.split(value.toLowerCase()))
)
.groupBy((user, count) -> user)
.count();
bidCounts.toStream().to("streams-bid-count-output",
Produced.with(stringSerde, longSerde));
36
Streaming patterns
Stream
processor
STREAM
/user-reg
FILTER
SELECT users > 18
PROJECT
SELECT user.name
JOIN
SELECT u.name, a.country
from user-reg u JOIN
address a WHERE u.id = a.id
GROUP BY (TABLE)
SELECT u.country,
count(u.name) FROM user-reg u
GROUP BY u.country
WINDOW (TABLE)
SELECT u.country, count(u.name)
FROM user-reg u WINDOW TUMBLING
(SIZE 1 MIN) group by u.country
STREAM
/address
3737
Stream processors are uniquely
convergent.
Data + Processing
(sorry dba’s)
3838
All of your data
is
a stream of events
3939
Streams are your persistence model
They are also
your local
database
4040
The atomic unit for tackling complexity
Stream
processor
input events
output events
...or microservice or whatever...
4141
Stream processor == Single atomic unit
It does one thing
Like
4242
We think in terms of function
“Bounded Context”
(dataflow - choreography)
4343
Let’s build something….
A simple dataflow series of processors
“Payment processing”
4444
KPay looks like this:
https://github.com/confluentinc/demo-scene/tree/master/scalable-payment-processing
4545
Bounded context
“Payments”
1. Payments inflight
2. Account processing [debit/credit]
3. Payments confirmed
46
Payments bounded context
choreography
47
Payments system: bounded context
[1] How much is being processed?
Expressed as:
- Count of payments inflight
- Total $ value processed
[2&3] Update the account balance
Expressed as:
- Debit
- Credit [4] Confirm successful payment
Expressed as:
- Total volume today
- Total $ amount today
48
Payments system: AccountProcessor
accountBalanceKTable = inflight.groupByKey()
.aggregate(
AccountBalance::new,
(key, value, aggregate) -> aggregate.handle(key, value), accountStore);
KStream<String, Payment>[] branch = inflight
.map((KeyValueMapper<String, Payment, KeyValue<String, Payment>>)
(key, value) -> {
if (value.getState() == Payment.State.debit) {
value.setStateAndId(Payment.State.credit);
} else if (value.getState() == Payment.State.credit) {
value.setStateAndId(Payment.State.complete);
}
return new KeyValue<>(value.getId(), value);
})
KTable state
(Kafka Streams)
49
Payments system: AccountBalance
public AccountBalance handle(String key, Payment value) {
this.name = value.getId();
if (value.getState() == Payment.State.debit) {
this.amount = this.amount.subtract(value.getAmount());
} else if (value.getState() == Payment.State.credit) {
this.amount = this.amount.add(value.getAmount());
} else {
// report to dead letter queue via exception handler
throw new RuntimeException("Invalid payment received:" + value);
}
this.lastPayment = value;
return this;
}
https://github.com/confluentinc/demo-scene/.../scalable-payment-processing/.../model/AccountBalance.java
50
Payments system: event model
https://github.com/confluentinc/demo-scene/.../scalable-payment-processing/.../io/confluent/kpay/payments
Event as APIEvent as API
5151
Bounded context
“Payments”
Is it enough?
no
5252
“It’s asynchronous, I don’t trust it”
(some developer, 2018)
5353
We only have one part of the picture
○ What about failures?
○ Upgrades?
○ How fast is it going?
○ What is happening - is it working?
5454
Composition
5555
Event-streaming pillars:
1. Business function (payment)
2. Instrumentation plane (trust)
3. …
4. ...
56
Instrumentation Plane (trust)
Goal: Prove the application is meeting business requirements
Metrics:
- Payments Inflight, Count and Dollar value
- Payment Complete, Count and Dollar value
5757
Event-streaming pillars:
1. Business function (payment)
2. Instrumentation plane (trust)
3. Control plane (coordinate)
4. ...
58
Control Plane
Goal: Provide mechanisms to coordinate system behavior
Why: Recover from outage, DR, overload etc
Applied: Flow control, start, pause, bootstrap, scale, gate and rate limit
Model:
- Status [pause, resume)
- Gate processor [Status]
- etc
5959
Event-streaming pillars:
1. Business function (payment)
2. Instrumentation plane (trust)
3. Control plane (coordinate)
4. Operational plane (run)
6060
Dependent on Control and Instrumentation planes
Dataflow patterns
● Application logs
● Error/Warning logs
● Audit logs
● Lineage
● Dead-letter-queues
/dead-letter/bid/region/processor
/ops/logs/category/
/ops/metrics/elast
Stream
processor
Operational Plane
61
Architectural pillars
/payments/incoming
PAY
/payments/confirmed
Core dataflow
Control plane
/control/state
START
STOP
/control/status
stream.filter()
Instrumentation plane
/payments/confirmed
BIZ
METRIC
IQ
IQ
IQ
/payments/dlq
ERROR
WARN
IQ
Operational plane
6262
Payment system
6363
Composition Patterns
6464
Flickr: Dave DeGobbi
Single Bounded Context (dataflow)
Choreography:
- Capture business function as a bounded context
- Events as API
2.
Accounts
[from]
payment.incoming
3.
Accounts
[to]
4.
Payment
Conf’d
1.
Payment
Inflight
payment.confirmed
payment.inflight
payment.inflight
payment.complete
payment.complete
Multiple Bounded contexts
Choreography:
- Chaining
- Layering
2.
Logistics
payment.incoming 1.
Payment
payment.complete
Multiple Bounded contexts
Orchestration
○ Captures workflow
○ Controls bounded context interaction
○ Business Process Model and Notation 2.0 (BPMN)
(Zeebe, Apache Airflow)
Source: https://docs.zeebe.io/bpmn-workflows/README.html
{faas}
Central nervous system
appappappapp
Payments Department 2
{faas}appappappapp
Department 3 Department 4
Pattern: Central nervous system
{faas}
What is going on here?
appappappapp
Payments Department 2
Patterns: Topic naming
bikeshedding (uncountable)
1. Futile investment of time and energy in
discussion of marginal technical issues.
2. Procrastination.
https://en.wiktionary.org/wiki/bikeshedding
Parkinson observed that a committee whose
job is to approve plans for a nuclear power
plant may spend the majority of its time on
relatively unimportant but easy-to-grasp
issues, such as what materials to use for the
staff bikeshed, while neglecting the design of
the power plant itself, which is far more
important but also far more difficult to
criticize constructively.
Patterns: Topic conventions
Chris:
<message type>.<dataset name>.<data name>
Variants:
<app-context>.<message type>.<data name>
<dept>.<region?>.<app-group>.<app-name>.<message type>.<data name>
source: Chris Riccomini
https://riccomini.name/how-paint-bike-shed-kafka-topic-naming-conventions
● logging
● queuing
● tracking
● etl/db
● streaming
● push
● user
Model the organization
Cloud Events
https://cloudevents.io/
Cloud Events
https://cloudevents.io/
Example
{
"specversion" : "0.4-wip",
"type" : "com.github.pull.create",
"source" : "https://github.com/cloudevents/spec/pull",
"subject" : "123",
"id" : "A234-1234-1234",
"time" : "2018-04-05T17:31:00Z",
"comexampleextension1" : "value",
"comexampleextension2" : {
"othervalue": 5
},
"datacontenttype" : "text/xml",
"data" : "<much wow="xml"/>"
}
Cloud Events
● Specification (an Event envelope)
● Transport Bindings (Kafka, AMQP, HTTP others)
● SDKs: Java, .NET, Go-lang etc
● Still early - nearly version 1.0
● Useful for exposing to external apps (Events as APIs)
● Support is coming for Kafka SerDes and CE.clients (sdk)
CloudEvents is a specification for describing event data in common formats
to provide interoperability across services, platforms and systems.
You will adopt CloudEvents
7575
Best practice for scale:
● Organise into Apps
● Apps comprised of dataflows
● Controlling context - Events as APIs
● Build once and share (instrumentation, control, lineage)
● Dataflow comprised of topic context (.../app/proc1)
● Topic naming conventions at Enterprise and App Level (BikeShedding!)
● Enable self-service (discovery, authorisation etc)
● Automation everywhere
7676
What about that software crisis that started in
1968?
“We believe that the major contributor to this complexity
in many systems is the handling of state and the burden
that this adds when trying to analyse and reason about
the system.”
Out of the tar pit, 2006
Our mental model: Abstraction as an Art
Chained/Orchestrated
Bounded contexts
Stream processor
Stream
Event
Pillars
Business function Control plane Instrumentation Operations
Bounded context
Key takeaway (state)
Event streamingdriven microservices are the atomic unit to:
1. Provide simplicity (and time travel)
2. Handle state (via Kafka Streams)
3. Provide a new paradigm: convergent data and logic processing
Stream
processor
Key takeaway (complexity)
● Event-Streaming apps: model as bounded-context dataflows, handle
state & scaling
● Patterns: Build reusable dataflow patterns (instrumentation)
● Composition: Bounded contexts chaining and layering
● Composition: Choreography and Orchestration
80
Questions?
@avery_neil
“Journey to event driven” blog
1. Event-first thinking
2. Programming models
3. Serverless
4. Pillars of event-streaming ms’s
Series linked on the @avery_neil twitter profile
81
THANK YOU
@avery_neil
neil@confluent.io
cnfl.io/meetups cnfl.io/blog cnfl.io/slack

Contenu connexe

Tendances

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Kai Wähner
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
confluent
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
confluent
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 

Tendances (20)

Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
Redis and Kafka - Advanced Microservices Design Patterns Simplified
Redis and Kafka - Advanced Microservices Design Patterns SimplifiedRedis and Kafka - Advanced Microservices Design Patterns Simplified
Redis and Kafka - Advanced Microservices Design Patterns Simplified
 
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data ArchitectureRediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
 
Real time analytics in Azure IoT
Real time analytics in Azure IoT Real time analytics in Azure IoT
Real time analytics in Azure IoT
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
 
Events Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public SectorEvents Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public Sector
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
 
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
 
Event streaming: A paradigm shift in enterprise software architecture
Event streaming: A paradigm shift in enterprise software architectureEvent streaming: A paradigm shift in enterprise software architecture
Event streaming: A paradigm shift in enterprise software architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Stream Processing in 2019
Stream Processing in 2019Stream Processing in 2019
Stream Processing in 2019
 
Bridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure WebinarBridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure Webinar
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
Amsterdam meetup at ING June 18, 2019
Amsterdam meetup at ING June 18, 2019Amsterdam meetup at ING June 18, 2019
Amsterdam meetup at ING June 18, 2019
 
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
 
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
 

Similaire à Kafka summit SF 2019 - the art of the event-streaming app

The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
confluent
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft Private Cloud
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
GemFire In Memory Data Grid
GemFire In Memory Data GridGemFire In Memory Data Grid
GemFire In Memory Data Grid
Dmitry Buzdin
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward
 

Similaire à Kafka summit SF 2019 - the art of the event-streaming app (20)

The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
 
Implementation domain driven design - ch04 architecture
Implementation domain driven design - ch04 architectureImplementation domain driven design - ch04 architecture
Implementation domain driven design - ch04 architecture
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the way
 
The Power of Distributed Snapshots in Apache Flink
The Power of Distributed Snapshots in Apache FlinkThe Power of Distributed Snapshots in Apache Flink
The Power of Distributed Snapshots in Apache Flink
 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional Management
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
GemFire In Memory Data Grid
GemFire In Memory Data GridGemFire In Memory Data Grid
GemFire In Memory Data Grid
 
Apache Pulsar Overview
Apache Pulsar OverviewApache Pulsar Overview
Apache Pulsar Overview
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
 
GemFire In-Memory Data Grid
GemFire In-Memory Data GridGemFire In-Memory Data Grid
GemFire In-Memory Data Grid
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Kafka summit SF 2019 - the art of the event-streaming app

  • 1. 1 The art of the event streaming application. streams, stream processors and scale Neil Avery, Office of the CTO, @avery_neil
  • 2. 22
  • 3. 33 Out of the tar pit, 2006
  • 4. 44 “We believe that the major contributor to this complexity in many systems is the handling of state and the burden that this adds when trying to analyse and reason about the system.” Out of the tar pit, 2006
  • 5. 55 We like ‘all the things’
  • 6. 66
  • 8. 88 What are microservices? Microservices are a software development technique - a variant of the service-oriented architecture (SOA) architectural style that structures an application as a collection of loosely coupled services. https://en.wikipedia.org/wiki/Microservices
  • 9. 99 structures an application as a collection of loosely coupled services. https://en.wikipedia.org/wiki/Microservices this is new!
  • 13. 13 Handling state is hard Cache? Embedded? Route to right instance?
  • 15. 1515 What have we learned about microservices? ● Scaling is hard ● Handling state is hard ● Sharing, coordinating is hard ● Run a database in each microservice - is hard
  • 16. 1616 Microservices We had it all wrong (again) FIX: Make them asynchronous and use streams of events We didn’t really know what they were anyway
  • 18. 1818 Event driven architectures aren’t new ..but the world has changed
  • 21. 21 An Event records the fact that something happened 21 A good was sold An invoice was issued A payment was made A new customer registered
  • 22. Events Why do you care? Loose coupling, autonomy, evolvability, scalability, resilience, traceability, replayability EVENT-FIRST CHANGES HOW YOU THINK ABOUT WHAT YOU ARE BUILDING ...more importantly...
  • 24. 24 Different types of event models ● Change Data Capture - CDC (database txn log) ● Time series (IoT, metrics) ● Microservices (domain events)
  • 26. 26 Time travel user experience? how many users affected?has it happened before? Ask many questions of the same data, again and again time
  • 27. 27 Evolvability user experience? how many users affected? has it happened before? new old supports data change, logic change, logic extension, schema evolution, loose coupling, add processors, A/B path
  • 28. 2828 OLD: event-driven architectures NEW: event-streaming architectures Organise events as streams
  • 29. Kafka is a database for events Kafka cluster stream processing Kafka Streams KSQL Producer/Consumer
  • 30. { user: 100 type: bid item: 389 cat: bikes/mtb region: dc-east } Partitions give you horizontal scale /bikes/ by item-id key# Key space {...} {...} {...} ConsumerTopic Partition Partition assignment Stream processor
  • 31. 31 Stream processing Kafka Streams processor input events output events ...temporal reasoning... event-driven microservice
  • 32. 32 It’s pretty powerful Stream processor Stream processor Stream processor Topic: click-stream Interactive query CDC events from KTable CDC Stream partition partition partition CQRS Elastic
  • 33. Streaming stack subscribe(), poll(), send(), flush(), beginTransaction(), … KStream, KTable, filter(), map(), flatMap(), join(), aggregate(), transform(), … CREATE STREAM, CREATE TABLE, SELECT, JOIN, GROUP BY, SUM, … KSQL UDFs Ease of Use Flexibility
  • 34. 34 KSQL SELECT u.country, count(u.name) FROM user-reg u WINDOW TUMBLING (SIZE 1 MIN) GROUP BY u.country
  • 35. 35 Kafka Streams (DSL) KStream<String, String> userBids = builder.stream("stream-user-bids"); final KTable<String, Long> bidCount = userBids .flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase())) ) .groupBy((user, count) -> user) .count(); bidCounts.toStream().to("streams-bid-count-output", Produced.with(stringSerde, longSerde));
  • 36. 36 Streaming patterns Stream processor STREAM /user-reg FILTER SELECT users > 18 PROJECT SELECT user.name JOIN SELECT u.name, a.country from user-reg u JOIN address a WHERE u.id = a.id GROUP BY (TABLE) SELECT u.country, count(u.name) FROM user-reg u GROUP BY u.country WINDOW (TABLE) SELECT u.country, count(u.name) FROM user-reg u WINDOW TUMBLING (SIZE 1 MIN) group by u.country STREAM /address
  • 37. 3737 Stream processors are uniquely convergent. Data + Processing (sorry dba’s)
  • 38. 3838 All of your data is a stream of events
  • 39. 3939 Streams are your persistence model They are also your local database
  • 40. 4040 The atomic unit for tackling complexity Stream processor input events output events ...or microservice or whatever...
  • 41. 4141 Stream processor == Single atomic unit It does one thing Like
  • 42. 4242 We think in terms of function “Bounded Context” (dataflow - choreography)
  • 43. 4343 Let’s build something…. A simple dataflow series of processors “Payment processing”
  • 44. 4444 KPay looks like this: https://github.com/confluentinc/demo-scene/tree/master/scalable-payment-processing
  • 45. 4545 Bounded context “Payments” 1. Payments inflight 2. Account processing [debit/credit] 3. Payments confirmed
  • 47. 47 Payments system: bounded context [1] How much is being processed? Expressed as: - Count of payments inflight - Total $ value processed [2&3] Update the account balance Expressed as: - Debit - Credit [4] Confirm successful payment Expressed as: - Total volume today - Total $ amount today
  • 48. 48 Payments system: AccountProcessor accountBalanceKTable = inflight.groupByKey() .aggregate( AccountBalance::new, (key, value, aggregate) -> aggregate.handle(key, value), accountStore); KStream<String, Payment>[] branch = inflight .map((KeyValueMapper<String, Payment, KeyValue<String, Payment>>) (key, value) -> { if (value.getState() == Payment.State.debit) { value.setStateAndId(Payment.State.credit); } else if (value.getState() == Payment.State.credit) { value.setStateAndId(Payment.State.complete); } return new KeyValue<>(value.getId(), value); }) KTable state (Kafka Streams)
  • 49. 49 Payments system: AccountBalance public AccountBalance handle(String key, Payment value) { this.name = value.getId(); if (value.getState() == Payment.State.debit) { this.amount = this.amount.subtract(value.getAmount()); } else if (value.getState() == Payment.State.credit) { this.amount = this.amount.add(value.getAmount()); } else { // report to dead letter queue via exception handler throw new RuntimeException("Invalid payment received:" + value); } this.lastPayment = value; return this; } https://github.com/confluentinc/demo-scene/.../scalable-payment-processing/.../model/AccountBalance.java
  • 50. 50 Payments system: event model https://github.com/confluentinc/demo-scene/.../scalable-payment-processing/.../io/confluent/kpay/payments Event as APIEvent as API
  • 52. 5252 “It’s asynchronous, I don’t trust it” (some developer, 2018)
  • 53. 5353 We only have one part of the picture ○ What about failures? ○ Upgrades? ○ How fast is it going? ○ What is happening - is it working?
  • 55. 5555 Event-streaming pillars: 1. Business function (payment) 2. Instrumentation plane (trust) 3. … 4. ...
  • 56. 56 Instrumentation Plane (trust) Goal: Prove the application is meeting business requirements Metrics: - Payments Inflight, Count and Dollar value - Payment Complete, Count and Dollar value
  • 57. 5757 Event-streaming pillars: 1. Business function (payment) 2. Instrumentation plane (trust) 3. Control plane (coordinate) 4. ...
  • 58. 58 Control Plane Goal: Provide mechanisms to coordinate system behavior Why: Recover from outage, DR, overload etc Applied: Flow control, start, pause, bootstrap, scale, gate and rate limit Model: - Status [pause, resume) - Gate processor [Status] - etc
  • 59. 5959 Event-streaming pillars: 1. Business function (payment) 2. Instrumentation plane (trust) 3. Control plane (coordinate) 4. Operational plane (run)
  • 60. 6060 Dependent on Control and Instrumentation planes Dataflow patterns ● Application logs ● Error/Warning logs ● Audit logs ● Lineage ● Dead-letter-queues /dead-letter/bid/region/processor /ops/logs/category/ /ops/metrics/elast Stream processor Operational Plane
  • 61. 61 Architectural pillars /payments/incoming PAY /payments/confirmed Core dataflow Control plane /control/state START STOP /control/status stream.filter() Instrumentation plane /payments/confirmed BIZ METRIC IQ IQ IQ /payments/dlq ERROR WARN IQ Operational plane
  • 65. Single Bounded Context (dataflow) Choreography: - Capture business function as a bounded context - Events as API 2. Accounts [from] payment.incoming 3. Accounts [to] 4. Payment Conf’d 1. Payment Inflight payment.confirmed payment.inflight payment.inflight payment.complete payment.complete
  • 66. Multiple Bounded contexts Choreography: - Chaining - Layering 2. Logistics payment.incoming 1. Payment payment.complete
  • 67. Multiple Bounded contexts Orchestration ○ Captures workflow ○ Controls bounded context interaction ○ Business Process Model and Notation 2.0 (BPMN) (Zeebe, Apache Airflow) Source: https://docs.zeebe.io/bpmn-workflows/README.html
  • 68. {faas} Central nervous system appappappapp Payments Department 2 {faas}appappappapp Department 3 Department 4 Pattern: Central nervous system
  • 69. {faas} What is going on here? appappappapp Payments Department 2 Patterns: Topic naming bikeshedding (uncountable) 1. Futile investment of time and energy in discussion of marginal technical issues. 2. Procrastination. https://en.wiktionary.org/wiki/bikeshedding Parkinson observed that a committee whose job is to approve plans for a nuclear power plant may spend the majority of its time on relatively unimportant but easy-to-grasp issues, such as what materials to use for the staff bikeshed, while neglecting the design of the power plant itself, which is far more important but also far more difficult to criticize constructively.
  • 70. Patterns: Topic conventions Chris: <message type>.<dataset name>.<data name> Variants: <app-context>.<message type>.<data name> <dept>.<region?>.<app-group>.<app-name>.<message type>.<data name> source: Chris Riccomini https://riccomini.name/how-paint-bike-shed-kafka-topic-naming-conventions ● logging ● queuing ● tracking ● etl/db ● streaming ● push ● user Model the organization
  • 73. Example { "specversion" : "0.4-wip", "type" : "com.github.pull.create", "source" : "https://github.com/cloudevents/spec/pull", "subject" : "123", "id" : "A234-1234-1234", "time" : "2018-04-05T17:31:00Z", "comexampleextension1" : "value", "comexampleextension2" : { "othervalue": 5 }, "datacontenttype" : "text/xml", "data" : "<much wow="xml"/>" }
  • 74. Cloud Events ● Specification (an Event envelope) ● Transport Bindings (Kafka, AMQP, HTTP others) ● SDKs: Java, .NET, Go-lang etc ● Still early - nearly version 1.0 ● Useful for exposing to external apps (Events as APIs) ● Support is coming for Kafka SerDes and CE.clients (sdk) CloudEvents is a specification for describing event data in common formats to provide interoperability across services, platforms and systems. You will adopt CloudEvents
  • 75. 7575 Best practice for scale: ● Organise into Apps ● Apps comprised of dataflows ● Controlling context - Events as APIs ● Build once and share (instrumentation, control, lineage) ● Dataflow comprised of topic context (.../app/proc1) ● Topic naming conventions at Enterprise and App Level (BikeShedding!) ● Enable self-service (discovery, authorisation etc) ● Automation everywhere
  • 76. 7676 What about that software crisis that started in 1968? “We believe that the major contributor to this complexity in many systems is the handling of state and the burden that this adds when trying to analyse and reason about the system.” Out of the tar pit, 2006
  • 77. Our mental model: Abstraction as an Art Chained/Orchestrated Bounded contexts Stream processor Stream Event Pillars Business function Control plane Instrumentation Operations Bounded context
  • 78. Key takeaway (state) Event streamingdriven microservices are the atomic unit to: 1. Provide simplicity (and time travel) 2. Handle state (via Kafka Streams) 3. Provide a new paradigm: convergent data and logic processing Stream processor
  • 79. Key takeaway (complexity) ● Event-Streaming apps: model as bounded-context dataflows, handle state & scaling ● Patterns: Build reusable dataflow patterns (instrumentation) ● Composition: Bounded contexts chaining and layering ● Composition: Choreography and Orchestration
  • 80. 80 Questions? @avery_neil “Journey to event driven” blog 1. Event-first thinking 2. Programming models 3. Serverless 4. Pillars of event-streaming ms’s Series linked on the @avery_neil twitter profile