PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
16 Mar 2023•0 j'aime
1 j'aime
Soyez le premier à aimer ceci
afficher plus
•469 vues
vues
Nombre de vues
0
Sur Slideshare
0
À partir des intégrations
0
Nombre d'intégrations
0
Télécharger pour lire hors ligne
Signaler
Logiciels
PhillyJug_ Getting Started With Real-time Cloud Native Streaming With Java
Live at Judge Group in King of Prussia Pennsylvania hosted by the Philadelphia Java Users Group (PhillyJUG)
Streaming for Java Developers
Multiple users, frameworks, languages, devices, data sources & clusters
• Expert in ETL (Eating, Ties
and Laziness)
• Deep SME in Buzzwords
• No Coding Skills
• R&D into Lasers
CAT AI
• Will Drive your Car?
• Will Fix Your Code?
• Will Beat You At Q-Bert
• Will Write my Next Talk
STREAMING ENGINEER
• Coding skills in Python,
Java
• Experience with Apache
Kafka
• Knowledge of database
query languages such as
SQL
• Knowledge of tools such
as Apache Flink, Apache
Spark and Apache NiFi
JAVA DEVELOPER
• Frameworks like Spring,
Quarkus and micronaut
• Relational Databases, SQL
• Cloud
• Dev and Build Tools
Run a Local Standalone Bare Metal
wget
https://archive.apache.org/dist/pulsar/pulsar-2.11.0/apache-pulsar-2.11.0-bin.
tar.gz
tar xvfz apache-pulsar-2.11.0-bin.tar.gz
cd apache-pulsar-2.11.0
bin/pulsar standalone
https://pulsar.apache.org/docs/en/standalone/
<or> Run in Docker
docker run -it
-p 6650:6650
-p 8080:8080
--mount source=pulsardata,target=/pulsar/data
--mount source=pulsarconf,target=/pulsar/conf
apachepulsar/pulsar:2.11.0
bin/pulsar standalone
https://pulsar.apache.org/docs/en/standalone-docker/
Building Tenant, Namespace, Topics
bin/pulsar-admin tenants create meetup
bin/pulsar-admin namespaces create meetup/philly
bin/pulsar-admin tenants list
bin/pulsar-admin namespaces list meetup
bin/pulsar-admin topics create persistent://meetup/philly/first
bin/pulsar-admin topics list meetup/philly
https://github.com/tspannhw/Meetup-YourFirstEventDrivenApp
Messages - The Basic Unit of Pulsar
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
Pulsar Cluster
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for
both Pulsar and
BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Metadata
Storage
Producer-Consumer
Producer Consumer
Publisher sends data and
doesn't know about the
subscribers or their status.
All interactions go through
Pulsar and it handles all
communication.
Subscriber receives data
from publisher and never
directly interacts with it
Topic
Topic
Apache Pulsar: Messaging vs Streaming
Message Queueing - Queueing
systems are ideal for work queues
that do not require tasks to be
performed in a particular order.
Streaming - Streaming works
best in situations where the
order of messages is important.
Pulsar Subscription Modes
Different subscription modes
have different semantics:
Exclusive/Failover -
guaranteed order, single active
consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
Flexible Pub/Sub API for Pulsar - Shared
Consumer consumer =
client.newConsumer()
.topic("my-topic")
.subscriptionName("work-q-1")
.subscriptionType(SubType.Shared)
.subscribe();
Flexible Pub/Sub API for Pulsar - Failover
Consumer consumer = client.newConsumer()
.topic("my-topic")
.subscriptionName("stream-1")
.subscriptionType(SubType.Failover)
.subscribe();
Data Offloaders
(Tiered Storage)
Client Libraries
StreamNative Pulsar Ecosystem
hub.streamnative.io
Connectors
(Sources & Sinks)
Protocol Handlers
Pulsar Functions
(Lightweight Stream
Processing)
Processing Engines
… and more!
… and more!
Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
AI BASED ENHANCEMENTS
SERVE
SOURCES
Data Warehouse
Report
Sensorid
Sensor conditions
Machine Learning
Predict, Automate
Control System
REPORT
Visualize
CLOUD
Collect
COLLECT
Message Broker
Data Flow
Distribute
Sensor id
Temperature
COLLECT and DISTRIBUTE DATA
1. Data is collected from sensors that use mqtt protocol via
CEM and sent to CDP Public Cloud
2. Two CDF flows run in the cloud to accomplish our two
goals: streaming analytics and batch analytics
ENRICH, REPORT
Report, Automate
Real time alerting
SQL
Stream Builder
Data Visualization
Edge
Management
Humidity
Timestamp
Visualize
Data Visualization
USE DATA
3. Streaming Use Case: some conditions of our greenhouse
must be avoided and have to be controlled in real time.
Some warnings have been defined to alert us in case
alerting conditions are met, control system is
automatically activated to adjust environmental
variables.
4. Batch analytics: to ensure the optimal growth of our
plants the ideal conditions have to be met for each
plant. Each 6 hours the plant conditions are monitored
and in case some control adjustment is required a ML
model gives a suggestion about getting the optimal
point minimizing the cost.
EDGE
Control System