Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
https://github.com/tspannhw/FLiP-Pi-DeltaLake-Thermal/
Tim Spann
Streamnative
https://github.com/tspannhw/pulsar-pychat-function
https://events.dataidols.com/talks/using-the-flipn-stack-for-edge-ai-flink-nifi-pulsar/
Hosted by Karen Jean-Francois.
Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest.
FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.
Tools Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI
References https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
Data Idols - Summer School - Data Science
2. Tim Spann
Developer Advocate
Tim Spann, Developer Advocate at StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFI Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Pulsar,
Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more.
○ Today, he helps to grow the Pulsar community sharing rich technical
knowledge and experience at both global conferences and through
individual conversations.
3.
4. 4
Introducing the FLiPN stack which
combines Apache Flink, Apache NiFi,
Apache Pulsar and other Apache
tools to build fast applications for IoT,
AI, rapid ingest. FLiPN provides a
quick set of tools to build applications
at any scale for any streaming and
IoT use cases. Tools Apache Flink,
Apache Pulsar, Apache NiFi, MiNiFi,
Apache MXNet, DJL.AI References
5. 5
● Learn how to build an end-to-end streaming edge app
● How to pull messages from Pulsar topics
● Building a data stream for IoT with NiFi
● Using Apache Flink + Apache Pulsar
● Using Apache Spark / Delta Lake / + Apache Pulsar
AGENDA
6. IoT DATA
IoT Ingestion: High-volume
streaming sources, sensors,
multiple message formats,
diverse protocols and
multi-vendor devices
creates data ingestion
challenges.
Other Sources: Transit data,
news, twitter, status feeds,
REST data, stock data and
more.
9. ➔ Perform in Real-Time
➔ Process Events as They Happen
➔ Joining Streams with SQL
➔ Find Anomalies Immediately
➔ Ordering and Arrival Semantics
➔ Continuous Streams of Data
DATA STREAMING
13. Messaging
Ideal for work queues that do not
require tasks to be performed in a
particular order—for example,
sending one email message to many
recipients.
RabbitMQ and Amazon SQS are
examples of popular queue-based
message systems.
PULSAR: UNIFIED MESSAGING + DATA
STREAMING
14. PULSAR: UNIFIED MESSAGING + DATA
STREAMING
.. and Streaming
Works best in situations where the order
of messages is important—for example,
data ingestion.
Kafka and Amazon Kinesis are examples
of messaging systems that use
streaming semantics for consuming
messages.
15. PULSAR PUB/SUB MODEL
Producer Consumer
Publisher sends data and
doesn't know about the
subscribers or their status.
All interactions go through
Pulsar, and it handles all
communication.
Subscriber receives data
from publisher and never
directly interacts with it.
Topic
Topic
18. ● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an “ensemble”
to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
KEY PULSAR CONCEPTS: ARCHITECTURE
MetaData
Storage
19. Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although
message data can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for
things like topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer
name, the default name is used. Message De-Duplication.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of
the message is its order in that sequence. Message De-Duplication.
MESSAGES - THE BASIC UNIT OF PULSAR
27. APACHE PULSAR TO IO SINK
https://pulsar.apache.org/docs/en/io-overview/
28.
29. ● Consume messages from one
or more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
PULSAR FUNCTIONS
30. PULSAR IO FUNCTIONS IN PYTHON
https://github.com/tspannhw/pulsar-pychat-function
31. ● Apache Pulsar’s two-tier architecture separates the compute and storage
layers, and interact with one another over a TCP/IP connection. This allows us
to run the computing layer (Broker) on either Edge servers or IoT Gateway
devices.
● Pulsar’s serverless computing framework, know as Pulsar Functions, can run
inside the Broker as threads. Effectively “stretching” the data processing layer.
EDGE COMPUTING WITH PULSAR
32. ● Pulsar’s Serverless computing framework can run inside the Pulsar Broker as a
thread pool. This framework can be used as the execution environment for ML
models.
● The Apache Pulsar Broker supports the MQTT protocol and therefore can
directly receive incoming data from the sensor hubs and store it in a topic.
BENEFITS OF EDGE COMPUTING WITH
PULSAR
33. PULSAR FUNCTION – 3RD PARTY LIBRARIES
● You can leverage 3rd
party
libraries within Pulsar Functions
● DeepLearning4J
● JPMML
● DJL.AI
● Keras
● Pulsar Functions are able to
support:
● A variety of ML model types.
● Models developed with
different languages and
toolkits
34. INTERACTION WITH MODEL SERVERS
● You can write an application in any language that works
with Apache Pulsar via libraries and calls any model server
(AWSLabs Multi-Model Server, Cloudera Data Science
Workbench Model Server and many others) via REST or
other protocols.
● Pulsar Functions can execute local models or call model
servers.
35. ML
• Visual Question and Answer
• NLP (Natural Language Processing)
• Sentiment Analysis
• Text Classification
• Named Entity Recognition
• Content-based Recommendations
• Predictive Maintenance
• Fault Detection
• Fraud Detection
• Time-Series Predictions
• Naive Bayes
47. SQL
select aqi, parameterName, dateObserved, hourObserved, latitude,
longitude, localTimeZone, stateCode, reportingArea from
airquality
select max(aqi) as MaxAQI, parameterName, reportingArea from
airquality group by parameterName, reportingArea
select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as
AvgAQI, count(aqi) as RowCount, parameterName, reportingArea from
airquality group by parameterName, reportingArea
65. Passionate and dedicated team.
Founded by the original developers of
Apache Pulsar.
StreamNative helps teams to capture,
manage, and leverage data using
Pulsar’s unified messaging and
streaming platform.
66. Apache Pulsar Training
• Instructor-led courses
• Pulsar Fundamentals
• Pulsar Developers
• Pulsar Operations
• On-demand learning with labs
• 300+ engineers, admins and architects trained!
STREAMNATIVE ACADEMY
Now Available
On-Demand
Pulsar Training
Academy.StreamNative.i
o
67. Hosted by
Save Your Spot Now
Use code SUMMIT20
to get 20% off.
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Summit Highlights:
● 5 Keynotes
● 12 Breakout Sessions
● 1 Amazing Happy Hour
Speakers from:
68. Pulsar Summit
San Francisco
Sponsorship
Prospectus
Community Sponsorships Available
Help engage and connect the Apache Pulsar
community by becoming an official sponsor
for Pulsar Summit San Francisco 2022! Learn
more about the requirements and benefits of
becoming a community sponsor.
Hosted by