Publicité
Publicité

Contenu connexe

Similaire à Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) (20)

Plus de Timothy Spann(20)

Publicité

Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)

  1. Using the FLiPN Stack for Edge AI Tim Spann | Developer Advocate
  2. Tim Spann Developer Advocate Tim Spann, Developer Advocate at StreamNative ● FLiP(N) Stack = Flink, Pulsar and NiFI Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  3. 4 Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest. FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases. Tools Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI References
  4. 5 ● Learn how to build an end-to-end streaming edge app ● How to pull messages from Pulsar topics ● Building a data stream for IoT with NiFi ● Using Apache Flink + Apache Pulsar ● Using Apache Spark / Delta Lake / + Apache Pulsar AGENDA
  5. IoT DATA IoT Ingestion: High-volume streaming sources, sensors, multiple message formats, diverse protocols and multi-vendor devices creates data ingestion challenges. Other Sources: Transit data, news, twitter, status feeds, REST data, stock data and more.
  6. ● Apache Flink ● Apache Pulsar ● Pulsar Functions ● Apache NiFi ● Apache Spark ● Python, Java, Golang FLIP(N)(S) STACK https://streamnative.io/blog/engineering/2022-04-14-what-the-flip-is-the-flip-stack/
  7. ➔ Perform in Real-Time ➔ Process Events as They Happen ➔ Joining Streams with SQL ➔ Find Anomalies Immediately ➔ Ordering and Arrival Semantics ➔ Continuous Streams of Data DATA STREAMING
  8. Sensors <-> STREAMING EDGE APPS StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Tiered Storage Pulsar --- KoP --- MoP --- Websocket Pulsar Sink Streaming Edge Gateway Protocols <-> Sensors <-> Apps
  9. WHAT IS APACHE PULSAR?
  10. 101 Unified Messaging Platform Guaranteed Message Delivery Resiliency Infinite Scalability
  11. Messaging Ideal for work queues that do not require tasks to be performed in a particular order—for example, sending one email message to many recipients. RabbitMQ and Amazon SQS are examples of popular queue-based message systems. PULSAR: UNIFIED MESSAGING + DATA STREAMING
  12. PULSAR: UNIFIED MESSAGING + DATA STREAMING .. and Streaming Works best in situations where the order of messages is important—for example, data ingestion. Kafka and Amazon Kinesis are examples of messaging systems that use streaming semantics for consuming messages.
  13. PULSAR PUB/SUB MODEL Producer Consumer Publisher sends data and doesn't know about the subscribers or their status. All interactions go through Pulsar, and it handles all communication. Subscriber receives data from publisher and never directly interacts with it. Topic Topic
  14. Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging UNIFIED MESSAGING MODEL
  15. Building Microservices Asynchronous Communication Building Real Time Applications Highly Resilient Tiered storage 17 PULSAR BENEFITS
  16. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery KEY PULSAR CONCEPTS: ARCHITECTURE MetaData Storage
  17. Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Message De-Duplication. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Message De-Duplication. MESSAGES - THE BASIC UNIT OF PULSAR
  18. MULTI-TENANCY MODEL Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Cluster Topic URI structure: persistent://[tenant]/[namespace]/[topic]
  19. Connectivity • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) hub.streamnative.io
  20. Kafka On Pulsar (KoP)
  21. MQTT On Pulsar (MoP)
  22. AMQP On Pulsar (AoP)
  23. Presto/Trino workers can read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordin ator . . . . . . SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata Pulsar SQL
  24. SCHEMA REGISTRY Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  25. APACHE PULSAR TO IO SINK https://pulsar.apache.org/docs/en/io-overview/
  26. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. PULSAR FUNCTIONS
  27. PULSAR IO FUNCTIONS IN PYTHON https://github.com/tspannhw/pulsar-pychat-function
  28. ● Apache Pulsar’s two-tier architecture separates the compute and storage layers, and interact with one another over a TCP/IP connection. This allows us to run the computing layer (Broker) on either Edge servers or IoT Gateway devices. ● Pulsar’s serverless computing framework, know as Pulsar Functions, can run inside the Broker as threads. Effectively “stretching” the data processing layer. EDGE COMPUTING WITH PULSAR
  29. ● Pulsar’s Serverless computing framework can run inside the Pulsar Broker as a thread pool. This framework can be used as the execution environment for ML models. ● The Apache Pulsar Broker supports the MQTT protocol and therefore can directly receive incoming data from the sensor hubs and store it in a topic. BENEFITS OF EDGE COMPUTING WITH PULSAR
  30. PULSAR FUNCTION – 3RD PARTY LIBRARIES ● You can leverage 3rd party libraries within Pulsar Functions ● DeepLearning4J ● JPMML ● DJL.AI ● Keras ● Pulsar Functions are able to support: ● A variety of ML model types. ● Models developed with different languages and toolkits
  31. INTERACTION WITH MODEL SERVERS ● You can write an application in any language that works with Apache Pulsar via libraries and calls any model server (AWSLabs Multi-Model Server, Cloudera Data Science Workbench Model Server and many others) via REST or other protocols. ● Pulsar Functions can execute local models or call model servers.
  32. ML • Visual Question and Answer • NLP (Natural Language Processing) • Sentiment Analysis • Text Classification • Named Entity Recognition • Content-based Recommendations • Predictive Maintenance • Fault Detection • Fraud Detection • Time-Series Predictions • Naive Bayes
  33. APACHE MXNET MODEL ZOO • CaffeNet • SqueezeNet v1.1 • Inception v3 • Single Shot Detection (SSD) • VGG16 • VGG19 • ResidualNet 152 • LSTM https://mxnet.apache.org/api/python/docs/api/gluon/model_zoo/index.html
  34. WHAT IS APACHE NiFi?
  35. APACHE NIFI PULSAR CONNECTOR https://streamnative.io/apache-nifi-connector/
  36. WHY APACHE NIFI? https://www.influxdata.com/integration/mqtt-monitoring/ https://www.influxdata.com/integration/mqtt-monitoring/ • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a 300 sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  37. APACHE NIFI PULSAR CONNECTOR https://github.com/streamnative/pulsar-nifi-bundle
  38. APACHE NIFI PULSAR CONNECTOR
  39. APACHE NIFI PULSAR CONNECTOR
  40. APACHE NIFI PULSAR CONNECTOR
  41. WHAT IS APACHE FLINK?
  42. ● Unified computing engine ● Batch processing is a special case of stream processing ● Stateful processing ● Massive Scalability ● Flink SQL for queries, inserts against Pulsar Topics ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite APACHE FLINK
  43. https://flink.apache.org/2019/05/03/pulsar-flink.html https://github.com/streamnative/pulsar-flink https://streamnative.io/en/blog/release/2021-04-20-flink-sql-on- streamnative-cloud FLINK + PULSAR
  44. SQL select aqi, parameterName, dateObserved, hourObserved, latitude, longitude, localTimeZone, stateCode, reportingArea from airquality select max(aqi) as MaxAQI, parameterName, reportingArea from airquality group by parameterName, reportingArea select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as AvgAQI, count(aqi) as RowCount, parameterName, reportingArea from airquality group by parameterName, reportingArea
  45. FLINK SQL
  46. WHAT IS APACHE SPARK?
  47. SPARK + PULSAR https://pulsar.apache.org/docs/en/adaptors-spark/ val dfPulsar = spark.readStream.format("pulsar") .option("service.url", "pulsar://pulsar1:6650") .option("admin.url", "http://pulsar1:8080") .option("topic", "persistent://public/default/airquality").load() val pQuery = dfPulsar.selectExpr("*") .writeStream.format("console") .option("truncate", false).start() ____ __ / __/__ ___ _____/ /__ _ / _ / _ `/ __/ '_/ /___/ .__/_,_/_/ /_/_ version 3.2.0 /_/ Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.11)
  48. DEMO TIME
  49. EDGE FLOW
  50. EDGE FLOW EXAMPLE 2
  51. ● BUFFER ● BATCH ● ROUTE ● FILTER ● AGGREGATE ● ENRICH ● REPLICATE ● DEDUPE ● DECOUPLE ● DISTRIBUTE
  52. MODEL CLASSIFICATION
  53. https://medium.com/@tspann/building-a-simple-streaming-real-time-chat-app-7f3b2cf1561d https://streamnative.io/blog/engineering/2022-03-17-streaming-real-time-chat-messages-into-scylla-with-apache-pulsar/
  54. MORE RESOURCES
  55. https://streamnative.io/blog/engineering/2021-11-17-building-edge-applications-with-apache-pulsar/
  56. DEPLOYING AI WITH AN EVENT-DRIVEN PLATFORM https://dzone.com/articles/deploying-ai-with-an- event-driven-platform
  57. FLIP STACK WEEKLY This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  58. FREE BOOK! https://streamnative.io/download/manning -ebook-apache-pulsar-in-action
  59. Passionate and dedicated team. Founded by the original developers of Apache Pulsar. StreamNative helps teams to capture, manage, and leverage data using Pulsar’s unified messaging and streaming platform.
  60. Apache Pulsar Training • Instructor-led courses • Pulsar Fundamentals • Pulsar Developers • Pulsar Operations • On-demand learning with labs • 300+ engineers, admins and architects trained! STREAMNATIVE ACADEMY Now Available On-Demand Pulsar Training Academy.StreamNative.i o
  61. Hosted by Save Your Spot Now Use code SUMMIT20 to get 20% off. Pulsar Summit San Francisco Hotel Nikko August 18 2022 Summit Highlights: ● 5 Keynotes ● 12 Breakout Sessions ● 1 Amazing Happy Hour Speakers from:
  62. Pulsar Summit San Francisco Sponsorship Prospectus Community Sponsorships Available Help engage and connect the Apache Pulsar community by becoming an official sponsor for Pulsar Summit San Francisco 2022! Learn more about the requirements and benefits of becoming a community sponsor. Hosted by
  63. [Webinar] Building Microservices Watch Now Learn More [Blog post] Event-Driven Microservices
  64. CODE ● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden ● https://github.com/tspannhw/FLiP-Pi-Thermal ● https://github.com/tspannhw/FLiP-Pi-Weather ● https://github.com/tspannhw/FLiP-RP400 ● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal ● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar ● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus ● https://github.com/tspannhw/PythonPulsarExamples ● https://github.com/tspannhw/pulsar-pychat-function ● https://github.com/tspannhw/FLiP-PulsarDevPython101 ● https://github.com/tspannhw/nifi-djlsentimentanalysis-pr ocessor
  65. Let’s Keep in Touch! Tim Spann Developer Advocate PaaSDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw
Publicité