Apache Kafka is the backbone for building architectures that deal with billions of events a day. Chris Castle, Developer Advocate, will show you where it might fit in your roadmap.
- What Apache Kafka is and how to use it on Heroku
- How Kafka enables you to model your data as immutable streams of events, introducing greater parallelism into your applications
- How you can use it to solve scale problems across your stack such as managing high throughput inbound events and building data pipelines
Learn more at https://www.heroku.com/kafka
Reveal.js version of slides: http://slides.com/christophercastle/deck#/
2. What problems does Apache Kafka
solve?
What are the core concepts of Kafka?
Why Apache Kafka on Heroku?
3. Forward-Looking Statements
Statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results
expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be
deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other
financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any
statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any
litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our
relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of
our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to
larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is
included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent
fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor
Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently
available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based
upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-
looking statements.
5. Event-Driven Architecture
Event-driven architecture (EDA), also known
as message-driven architecture, is a
software architecture pattern promoting the
production, detection, consumption of, and
reaction to events.
Source: Wikipedia
6. What Are Events?
Context
When was the event? (event time, process time)?
What produced the event? (causal history, device, etc)
Where did the event occur? (system location, geo location)
Operation
What function was applied? (create, update, delete, etc)
What are the characteristics of the function?
State
What is the data involved in the event?
How is that data identified?
"Contextualized operation on state"
7. Event Examples
Product views
Completed sales
Page visits
Site logins
Shipping notifications
Inventory received
IoT sensor values
Weather data
Traffic data
Tweets
Election polling data!
Completed sale
2016-11-03T15:13:27Z
Retail www site
referrer Google search
Inventory item purchased
Amazon Echo, Black
$179.99
ID B00X4WHP5E
Context
Operation
State
8. Why Should I Care?
Scaling too slowly leads to dropped data
Overprovisioning leads to inefficient systems
Dataflow between processing stages requires coordination
Parallel pipelines with the same data can be nontrivial
Service discovery must support current and future processes
Sequencing service availability is critical to system function
Possible loss of state when individual services fail
9. Why Should I Care?
Inbound Streams
Scaling too slowly leads to dropped data
Overprovisioning leads to inefficient systems
Backpressure and other coordination is hard!
Data Pipelines
Dataflow between processing stages requires coordination
Parallel pipelines with the same data can be nontrivial
Provenance and auditability!?
Microservices
Service discovery must support current and future processes
Sequencing service availability is critical to system function
Possible loss of state when individual services fail
10. Why Should I Care?
Inbound Streams
Event streams in Kafka allow other resources to pull when ready
Resources can fail and reconnect without dropping events
Kafka provides elasticity, reducing the need for backpressure
Data Pipelines
Dataflow coordination is reduced via event stream structure
The immutability of data allows for trivial parallel processing
Tracking provenance and lineage of data becomes possible
Microservices
Services now only need to discover topics in Kafka
Service availability sequencing is relaxed
Inter-service communication is more robust
11. Use Cases
Heroku Platform Event Stream
Learn more at
https://blog.heroku.com/powering-the-heroku-platform-api-a-distributed-systems-approach-using-streams-and-apache-kafka
17. Apache Kafka Core Concepts
PRODUCERS CONSUMERS
Brokers
The instances running Kafka and managing
streams of events in a cluster.
Producers + Consumers
Clients that write to or read from a Kafka
cluster.
Topics
Streams of events that are replicated across
the brokers. Configured with time based
retention or log compaction.
Partitions
Discrete subsets of topics, and important
tuning points for parallelism and ordering.
BROKER
TOPIC
PARTITION
18. Example Producers
Product views
Completed sales
Page visits
Site logins
Shipping notifications
Inventory received
IoT data
Weather data
Traffic data
Tweets
Election polling data!
Web server
Payment processor
Browser
Authentication service
Shipping provider
Warehouse
Motion sensor
Rain gauge
Vehicle sensor
Twitter
Online/phone survey
19. Personalization engine
Accounting system
Reporting dashboard
Security audit service
Shipping provider
Inventory database
Actuator
Climate model
Traffic map
Analytics dashboard
Election forecast
Example Consumers
Product views
Completed sales
Page visits
Site logins
Shipping notifications
Inventory received
IoT data
Weather data
Traffic data
Tweets
Election polling data!
25. Without Heroku
Apache Kafka
The heart of the event management system, with
a broad variety of configurations and options.
Apache Zookeeper
The system’s consensus and coordination cluster
is vital for Kafka’s operation.
OS + JVM Tuning
Tuning the cluster runtimes can be an art.
Instances + Networking
Physical or virtual, the infrastructure behind
clusters must be well considered.
Myriad Moving Pieces
28. Apache Kafka on Heroku
Experienced Staff
Self-Healing
Current Version
No-Downtime Upgrades
Heroku engineers have contributed patches
to the core open source Kafka project.
29. Apache Kafka on Heroku
Global
US West
US East
Ireland
Germany
Japan
Sydney
30. Let's Review...
...and get you started with Kafka!
Apache Kafka is a valuable tool for building architectures to support
inbound event streams, data processing pipelines, and microservices
coordination.
The primitives provided by Kafka -- topics, partitions, retention
duration, log compaction, and replication -- provide the tools to
manage structured event streams.
Apache Kafka on Heroku simplifies operational complexity so that
any developer can get started quickly and feel confident that their
application is supported by a rock-solid, production service.
Get started at
hrku.co/use-kafka
31. Q&A
Rand Fitzpatrick, Director of Product
Chris Castle, Developer Advocate
But first, please take one minute to answer a few
quick questions so we can make webinars like this
even better for you.
32. Learn More
Apache Kafka on Heroku
Get Started
Documentation
Kafka Event Stream Modeling
Podcast: Managed Kafka with Heroku Engineer Tom Crayford
https://www.heroku.com/kafka
https://elements.heroku.com/addons/heroku-kafka
https://devcenter.heroku.com/articles/kafka-on-heroku
https://devcenter.heroku.com/articles/kafka-event-stream-modeling
http://softwareengineeringdaily.com/2016/10/25/managed-kafka-
with-tom-crayford/