Contenu connexe

Similaire à The Dream Stream Team for Pulsar and Spring(20)


Plus de Timothy Spann(20)


The Dream Stream Team for Pulsar and Spring

  1. The Dream Stream Team for Pulsar & Spring Tim Spann | Developer Advocate
  2. Tim Spann Developer Advocate ● FLiP(N) Stack = Flink, Pulsar and NiFi Stack ● Streaming Systems/ Data Architect ● Experience: ○ 15+ years of experience with batch and streaming technologies including Pulsar, Flink, Spark, NiFi, Spring, Java, Big Data, Cloud, MXNet, Hadoop, Datalakes, IoT and more.
  3. Our bags are packed. Let’s begin the Journey to Real-Time Unified Messaging And Streaming
  5. ● Introduction ● What is Apache Pulsar? ● Spring Apps ○ Native Pulsar ○ AMQP ○ MQTT ○ Kafka ● Demo ● Q&A This is the cover art for the vinyl LP "Unknown Pleasures" by the artist Joy Division. The cover art copyright is believed to belong to the label, Factory Records, or the graphic artist(s).
  6. 101 Unified Messaging Platform Guaranteed Message Delivery Resiliency Infinite Scalability
  7. WHERE?
  8. Pulsar Global Adoption
  9. WHAT?
  10. Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging Unified Messaging Model
  11. Tenants / Namespaces / Topics Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Instance Pulsar Cluster
  12. Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Message De-Duplication. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Message De-Duplication. Messages - The Basic Unit of Pulsar
  13. Pulsar Subscription Modes Different subscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2, V 21 > < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  14. Flexible Pub/Sub API for Pulsar - Shared Consumer consumer = client.newConsumer() .topic("my-topic") .subscriptionName("work-q-1") .subscriptionType(SubType.Shared) .subscribe();
  15. Flexible Pub/Sub API for Pulsar - Failover Consumer consumer = client.newConsumer() .topic("my-topic") .subscriptionName("stream-1") .subscriptionType(SubType.Failover) .subscribe();
  16. Reader Interface byte[] msgIdBytes = // Some byte array MessageId id = MessageId.fromByteArray(msgIdBytes); Reader<byte[]> reader = pulsarClient.newReader() .topic(topic) .startMessageId(id) .create(); Create a reader that will read from some message between earliest and latest. Reader
  17. Connectivity • Libraries - (Java, Python, Go, NodeJS, WebSockets, C++, C#, Scala, Kotlin,...) • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT), RoP (RocketMQ) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3)
  18. Pulsar SQL Presto/Trino workers can read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordinator ... ... SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata
  19. Kafka On Pulsar (KoP)
  20. MQTT On Pulsar (MoP)
  21. AMQP On Pulsar (AoP)
  22. RocketMQ On Pulsar (RoP)
  23. Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  24. Pulsar Functions ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. A serverless event streaming framework
  25. import org.apache.pulsar.client.impl.schema.JSONSchema; import org.apache.pulsar.functions.api.*; public class AirQualityFunction implements Function<byte[], Void> { @Override public Void process(byte[] input, Context context) { context.getLogger().debug("DBG:” + new String(input)); context.newOutputMessage(“topicname”, JSONSchema.of(Observation.class)) .key(UUID.randomUUID().toString()) .property(“prop1”, “value1”) .value(observation) .send(); } } Your Code Here Pulsar Function SDK
  26. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions
  27. You can start really small! ● A Docker container ● A Standalone Node ● Small MiniKube Pods ● Free Tier SN Cloud
  28. Pulsar - Spring
  29. Spring - Kafka - Pulsar @Bean public KafkaTemplate<String, Observation> kafkaTemplate() { KafkaTemplate<String, Observation> kafkaTemplate = new KafkaTemplate<String, Observation>(producerFactory()); return kafkaTemplate; } ProducerRecord<String, Observation> producerRecord = new ProducerRecord<>(topicName, uuidKey.toString(), message); kafkaTemplate.send(producerRecord);
  30. Spring - MQTT - Pulsar @Bean public IMqttClient mqttClient( @Value("${mqtt.clientId}") String clientId, @Value("${mqtt.hostname}") String hostname, @Value("${mqtt.port}") int port) throws MqttException { IMqttClient mqttClient = new MqttClient( "tcp://" + hostname + ":" + port, clientId); mqttClient.connect(mqttConnectOptions()); return mqttClient; } MqttMessage mqttMessage = new MqttMessage(); mqttMessage.setPayload(DataUtility.serialize(payload)); mqttMessage.setQos(0); mqttMessage.setRetained(true); mqttClient.publish(topicName, mqttMessage);
  31. Spring - AMQP - Pulsar rabbitTemplate.convertAndSend(topicName, DataUtility.serializeToJSON(observation)); @Bean public CachingConnectionFactory connectionFactory() { CachingConnectionFactory ccf = new CachingConnectionFactory(); ccf.setAddresses(serverName); return ccf; }
  32. Spring - Websockets - Pulsar
  33. Spring - Presto SQL/Trino - Pulsar via Presto or JDBC
  34. Reactive Pulsar pring-boot pring-boot <dependencies> <dependency> <groupId>com.github.lhotari</groupId> <artifactId>reactive-pulsar-adapter</artifactId> <version>0.2.0</version> </dependency> </dependencies>
  35. REST + Spring Boot + Pulsar + Friends
  36. StreamNative Hub StreamNative Cloud Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Apache Pulsar - Spring <-> Events <-> Spring <-> Anywhere Tiered Storage Pulsar --- KoP --- MoP --- AoP --- Websocket Pulsar Sink Pulsar Sink Data Apps Protocols Data Driven Apps Micro Service (Queuing + Streaming)
  37. Demo
  38. WHO?
  39. Let’s Keep in Touch! Tim Spann Developer Advocate @PaaSDev
  40. Apache Pulsar Training ● Instructor-led courses ○ Pulsar Fundamentals ○ Pulsar Developers ○ Pulsar Operations ● On-demand learning with labs ● 300+ engineers, admins and architects trained! StreamNative Academy Now Available On-Demand Pulsar Training
  41. Pulsar Resources Page Learn More r-consuming-data-and-generating-spring-cloud-stream -sink-applications hat-the-flip-is-the-flip-stack/
  42. Spring Things ● ● ● ● ● ● ● ● ● ● ●
  43. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark, Java and Open Source friends.
  44. ● Buffer ● Batch ● Route ● Filter ● Aggregate ● Enrich ● Replicate ● Dedupe ● Decouple ● Distribute
  45. StreamNative Cloud
  46. Passionate and dedicated team. Founded by the original developers of Apache Pulsar. StreamNative helps teams to capture, manage, and leverage data using Pulsar’s unified messaging and streaming platform.
  47. Founded By The Creators Of Apache Pulsar Sijie Guo ASF Member Pulsar/BookKeeper PMC Founder and CEO Jia Zhai Pulsar/BookKeeper PMC Co-Founder Matteo Merli ASF Member Pulsar/BookKeeper PMC CTO Data veterans with extensive industry experience
  48. Notices Apache Pulsar™ Apache®, Apache Pulsar™, Pulsar™, Apache Flink®, Flink®, Apache Spark®, Spark®, Apache NiFi®, NiFi® and the logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks. Copyright © 2021-2022 The Apache Software Foundation. All Rights Reserved. Apache, Apache Pulsar and the Apache feather logo are trademarks of The Apache Software Foundation.
  49. Extra Slides
  50. Built-in Back Pressure Producer<String> producer = client.newProducer(Schema.STRING) .topic("SpringIOBarcelona2022") .blockIfQueueFull(true) // enable blocking if queue is full .maxPendingMessages(10) // max queue size .create(); // During Back Pressure: the sendAsync call blocks // with no room in queues producer.newMessage() .key("mykey") .value("myvalue") .sendAsync(); // can be a blocking call
  51. Producing Object Events From Java ProducerBuilder<Observation> producerBuilder = pulsarClient.newProducer(JSONSchema.of(Observation.class)) .topic(topicName) .producerName(producerName).sendTimeout(60, TimeUnit.SECONDS); Producer<Observation> producer = producerBuilder.create(); msgID = producer.newMessage() .key(someUniqueKey) .value(observation) .send();
  52. Producer-Consumer Producer Consumer Publisher sends data and doesn't know about the subscribers or their status. All interactions go through Pulsar and it handles all communication. Subscriber receives data from publisher and never directly interacts with it Topic Topic
  53. Apache Pulsar: Messaging vs Streaming Message Queueing - Queueing systems are ideal for work queues that do not require tasks to be performed in a particular order. Streaming - Streaming works best in situations where the order of messages is important.
  54. 56 ➔ Perform in Real-Time ➔ Process Events as They Happen ➔ Joining Streams with SQL ➔ Find Anomalies Immediately ➔ Ordering and Arrival Semantics ➔ Continuous Streams of Data DATA STREAMING
  55. Pulsar’s Publish-Subscribe model Broker Subscription Consumer 1 Consumer 2 Consumer 3 Topic Producer 1 Producer 2 ● Producers send messages. ● Topics are an ordered, named channel that producers use to transmit messages to subscribed consumers. ● Messages belong to a topic and contain an arbitrary payload. ● Brokers handle connections and routes messages between producers / consumers. ● Subscriptions are named configuration rules that determine how messages are delivered to consumers. ● Consumers receive messages.
  56. Using Pulsar for Java Apps 58 High performance High security Multiple data consumers: Transactions, CDC, fraud detection with ML Large data volumes, high scalability Multi-tenancy and geo-replication
  57. Accessing historical as well as real-time data Pub/sub model enables event streams to be sent from multiple producers, and consumed by multiple consumers To process large amounts of data in a highly scalable way
  58. Testing via Phone MQTT
  59. Simple Function (Pulsar SDK)
  61. Sensors <-> Streaming Java Apps StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Tiered Storage Pulsar --- KoP --- MoP --- Websocket Pulsar Sink Streaming Edge Gateway Protocols <-> Sensors <-> Apps
  62. StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Apache Flink - Apache Pulsar - Apache NiFi <-> Devices Tiered Storage Pulsar --- KoP --- MoP --- Websocket --- HTTP Pulsar Sink Streaming Edge Gateway Protocols End-to-End Streaming Edge App
  63. ML Java Coding (DJL)
  64. Source Code
  65. import java.util.function.Function; public class MyFunction implements Function<String, String> { public String apply(String input) { return doBusinessLogic(input); } } Your Code Here Pulsar Function Java
  66. Setting Subscription Type Java Consumer<byte[]> consumer = pulsarClient.newConsumer() .topic(topic) .subscriptionName(“subscriptionName") .subscriptionType(SubscriptionType.Shared) .subscribe();
  67. Subscribing to a Topic and setting Subscription Name Java Consumer<byte[]> consumer = pulsarClient.newConsumer() .topic(topic) .subscriptionName(“subscriptionName") .subscribe();
  68. Run a Local Standalone Bare Metal wget bin.tar.gz tar xvfz apache-pulsar-2.10.0-bin.tar.gz cd apache-pulsar-2.10.0 bin/pulsar standalone (For Pulsar SQL Support) bin/pulsar sql-worker start
  69. Building Tenant, Namespace, Topics bin/pulsar-admin tenants create conference bin/pulsar-admin namespaces create conference/spring bin/pulsar-admin tenants list bin/pulsar-admin namespaces list conference bin/pulsar-admin topics create persistent://conference/spring/first bin/pulsar-admin topics list conference/spring
  70. Simple Function (Java Native)
  71. Multiple Sends
  72. Pulsar IO Functions in Java bin/pulsar-admin functions create --auto-ack true --jar a.jar --classname "io.streamnative.Tim" --inputs "persistent://public/default/chat" --log-topic "persistent://public/default/logs" --name Chat --output "persistent://public/default/chatresult"
  73. Pulsar Producer import java.util.UUID; import; import org.apache.pulsar.client.api.Producer; import org.apache.pulsar.client.api.ProducerBuilder; import org.apache.pulsar.client.api.PulsarClient; import org.apache.pulsar.client.api.MessageId; import org.apache.pulsar.client.impl.auth.oauth2.AuthenticationFactoryOAuth2; PulsarClient client = PulsarClient.builder() .serviceUrl(serviceUrl) .authentication( AuthenticationFactoryOAuth2.clientCredentials( new URL(issuerUrl), new URL(credentialsUrl.), audience)) .build();
  74. Pulsar Simple Producer String pulsarKey = UUID.randomUUID().toString(); String OS = System.getProperty("").toLowerCase(); ProducerBuilder<byte[]> producerBuilder = client.newProducer().topic(topic) .producerName("demo"); Producer<byte[]> producer = producerBuilder.create(); MessageId msgID = producer.newMessage().key(pulsarKey).value("msg".getBytes()) .property("device",OS).send(); producer.close(); client.close();
  75. Producer Steps (Async) PulsarClient client = PulsarClient.builder() // Step 1 Create a client. .serviceUrl("pulsar://broker1:6650") .build(); // Client discovers all brokers // Step 2 Create a producer from client. Producer<String> producer = client.newProducer(Schema.STRING) .topic("hellotopic") .create(); // Topic lookup occurs, producer registered with broker // Step 3 Message is built, which will be buffered in async. CompletableFuture<String> msgFuture = producer.newMessage() .key("mykey") .value("myvalue") .sendAsync(); // Client returns when message is stored in buffer // Step 4 Message is flushed (usually automatically). producer.flush(); // Step 5 Future returns once the message is sent and ack is returned. msgFuture.get();
  76. Producer Steps (Sync) // Step 1 Create a client. PulsarClient client = PulsarClient.builder() .serviceUrl("pulsar://broker1:6650") .build(); // Client discovers all brokers // Step 2 Create a producer from client. Producer<String> producer = client.newProducer(Schema.STRING) .topic("persistent://public/default/hellotopic") .create(); // Topic lookup occurs, producer registered with broker. // Step 3 Message is built, which will be sent immediately. producer.newMessage() .key("mykey") .value("myvalue") .send(); // client waits for send and ack
  77. Subscribing to a Topic and setting Subscription Name Java Consumer<byte[]> consumer = pulsarClient.newConsumer() .topic(topic) .subscriptionName(“subscriptionName") .subscribe();
  78. Fanout, Queueing, or Streaming In Pulsar, you have flexibility to use different subscription modes. If you want to... Do this... And use these subscription modes... achieve fanout messaging among consumers specify a unique subscription name for each consumer exclusive (or failover) achieve message queuing among consumers share the same subscription name among multiple consumers shared allow for ordered consumption (streaming) distribute work among any number of consumers exclusive, failover or key shared
  79. Setting Subscription Type Java Consumer<byte[]> consumer = pulsarClient.newConsumer() .topic(topic) .subscriptionName(“subscriptionName") .subscriptionType(SubscriptionType.Shared) .subscribe();
  80. Delayed Message Producing // Producer creation like normal Producer<String> producer = client.newProducer(Schema.STRING) .topic("hellotopic") .create(); // With an explicit time producer.newMessage() .key("mykey") .value("myvalue") .deliverAt(millisSinceEpoch) .send(); // Relative time producer.newMessage() .key("mykey") .value("myvalue") .deliverAfter(1, TimeUnit.HOURS) .send();
  81. Messaging versus Streaming Messaging Use Cases Streaming Use Cases Service x commands service y to make some change. Example: order service removing item from inventory service Moving large amounts of data to another service (real-time ETL). Example: logs to elasticsearch Distributing messages that represent work among n workers. Example: order processing not in main “thread” Periodic jobs moving large amounts of data and aggregating to more traditional stores. Example: logs to s3 Sending “scheduled” messages. Example: notification service for marketing emails or push notifications Computing a near real-time aggregate of a message stream, split among n workers, with order being important. Example: real-time analytics over page views
  82. Differences in consumption Messaging Use Case Streaming Use Case Retention The amount of data retained is relatively small - typically only a day or two of data at most. Large amounts of data are retained, with higher ingest volumes and longer retention periods. Throughput Messaging systems are not designed to manage big “catch-up” reads. Streaming systems are designed to scale and can handle use cases such as catch-up reads.
  83. Producer Routing Modes (when not using a key)
  84. Routing
  85. ### --- Kafka-on-Pulsar KoP (Example from standalone.conf) messagingProtocols=mqtt,kafka allowAutoTopicCreationType=partitioned kafkaListeners=PLAINTEXT:// brokerEntryMetadataInterceptors=org.apache.pulsar.common.intercept.AppendIndexMetad ataInterceptor brokerDeleteInactiveTopicsEnabled=false kopAllowedNamespaces=true requestTimeoutMs=60000 groupMaxSessionTimeoutMs=600000 ### --- Kafka-on-Pulsar KoP (end) 87
  86. Cleanup bin/pulsar-admin topics delete persistent://meetup/newjersey/first bin/pulsar-admin namespaces delete meetup/newjersey bin/pulsar-admin tenants delete meetup
  87. Messaging Ordering Guarantees To guarantee message ordering, architect Pulsar to take advantage of subscription modes and topic ordering guarantees. Topic Ordering Guarantees ● Messages sent to a single topic or partition DO have an ordering guarantee. ● Messages sent to different partitions DO NOT have an ordering guarantee. Subscription Mode Guarantees ● A single consumer can receive messages from the same partition in order using an exclusive or failover subscription mode. ● Multiple consumers can receive messages from the same key in order using the key_shared subscription mode.
  88. What’s Next? Here are resources to continue your journey with Apache Pulsar