Contenu connexe

Similaire à Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)(20)


Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)

  1. Apache Pulsar Development 101 With Python Tim Spann | Developer Advocate
  2. ● What is Apache Pulsar? ● Python 3 Coding ● Python Consumers ● Python Producers ● Python via MQTT, Web Sockets, Kafka ● Python for Pulsar Functions ● Schemas
  3. Tim Spann Developer Advocate Tim Spann, Developer Advocate at StreamNative ● FLiP(N) Stack = Flink, Pulsar and NiFI Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  4. Passionate and dedicated team. Founded by the original developers of Apache Pulsar. StreamNative helps teams to capture, manage, and leverage data using Pulsar’s unified messaging and streaming platform.
  5. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends.
  6. Apache Pulsar Training ● Instructor-led courses ○ Pulsar Fundamentals ○ Pulsar Developers ○ Pulsar Operations ● On-demand learning with labs ● 300+ engineers, admins and architects trained! StreamNative Academy Now Available On-Demand Pulsar Training
  7. Why Apache Pulsar? Unified Messaging Platform Guaranteed Message Delivery Resiliency Infinite Scalability
  8. A Unified Messaging Platform Message Queuing Data Streaming
  9. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Pulsar Cluster Metadata Store (ZK, RocksDB, etcd, …)
  10. Pulsar’s Publish-Subscribe model Broker Subscription Consumer 1 Consumer 2 Consumer 3 Topic Producer 1 Producer 2 ● Producers send messages. ● Topics are an ordered, named channel that producers use to transmit messages to subscribed consumers. ● Messages belong to a topic and contain an arbitrary payload. ● Brokers handle connections and routes messages between producers / consumers. ● Subscriptions are named configuration rules that determine how messages are delivered to consumers. ● Consumers receive messages.
  11. Messages - the Basic Unit of Pulsar Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence.
  12. Connectivity • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3)
  13. Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  14. MQTT On Pulsar (MoP)
  15. Kafka On Pulsar (KoP)
  16. Presto/Trino workers can read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordin ator . . . . . . SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata Pulsar SQL
  17. ● Buffer ● Batch ● Route ● Filter ● Aggregate ● Enrich ● Replicate ● Dedupe ● Decouple ● Distribute
  18. Streaming FLiP-Py Apps StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Tiered Storage Pulsar --- KoP --- MoP --- Websocket Pulsar Sink Streaming Edge Gateway Protocols CDC Apps
  19. Pulsar Functions ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. ● Python Functions A serverless event streaming framework
  20. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions
  21. Function Mesh Pulsar Functions, along with Pulsar IO/Connectors, provide a powerful API for ingesting, transforming, and outputting data. Function Mesh, another StreamNative project, makes it easier for developers to create entire applications built from sources, functions, and sinks all through a declarative API.
  22. Python 3 Coding Code Along With Tim <<DEMO>>
  23. Run a Local Standalone Bare Metal wget n.tar.gz tar xvfz apache-pulsar-2.9.1-bin.tar.gz cd apache-pulsar-2.9.1 bin/pulsar standalone (For Pulsar SQL Support) bin/pulsar sql-worker start
  24. <or> Run in StreamNative Cloud Scan the QR code to earn $200 in cloud credit
  25. Building Tenant, Namespace, Topics bin/pulsar-admin tenants create conference bin/pulsar-admin namespaces create conference/pythonweb bin/pulsar-admin tenants list bin/pulsar-admin namespaces list conference bin/pulsar-admin topics create persistent://conference/pythonweb/first bin/pulsar-admin topics list conference/pythonweb
  26. Install Python 3 Pulsar Client pip3 install pulsar-client=='2.9.1[all]' # Depending on Platform May Need to Build C++ Client For Python on Pulsar on Pi
  27. Building a Python 3 Producer import pulsar client = pulsar.Client('pulsar://localhost:6650') producer = client.create_producer('persistent://conference/pythonweb/first') producer.send(('Simple Text Message').encode('utf-8')) client.close()
  28. Building a Python 3 Cloud Producer Oath python3 -su pulsar+ssl:// -t persistent://public/default/pyth --auth-params '{"issuer_url":"", "private_key":"my.json", "audience":"urn:sn:pulsar:name:myclustr"}' from pulsar import Client, AuthenticationOauth2 parse = argparse.ArgumentParser(') parse.add_argument('-su', '--service-url', dest='service_url', type=str, required=True) args = parse.parse_args() client = pulsar.Client(args.service_url, authentication=AuthenticationOauth2(args.auth_params))
  29. Example Avro Schema Usage import pulsar from pulsar.schema import * from pulsar.schema import AvroSchema class thermal(Record): uuid = String() client = pulsar.Client('pulsar://pulsar1:6650') thermalschema = AvroSchema(thermal) producer = client.create_producer(topic='persistent://public/default/pi-thermal-avro', schema=thermalschema,properties={"producer-name": "thrm" }) thermalRec = thermal() thermalRec.uuid = "unique-name" producer.send(thermalRec,partition_key=uniqueid)
  30. Example Json Schema Usage import pulsar from pulsar.schema import * from pulsar.schema import JsonSchema class weather(Record): uuid = String() client = pulsar.Client('pulsar://pulsar1:6650') wschema = JsonSchema(thermal) producer = client.create_producer(topic='persistent://public/default/weathe r,schema=wschema,properties={"producer-name": "wthr" }) weatherRec = weather() weatherRec.uuid = "unique-name" producer.send(weatherRec,partition_key=uniqueid)
  31. Building a Python3 Consumer import pulsar client = pulsar.Client('pulsar://localhost:6650') consumer = client.subscribe('persistent://conference/pythonweb/first',subscription_na me='my-sub') while True: msg = consumer.receive() print("Received message: '%s'" % consumer.acknowledge(msg) client.close()
  32. MQTT from Python pip3 install paho-mqtt import paho.mqtt.client as mqtt client = mqtt.Client("rpi4-iot") row = { } row['gasKO'] = str(readings) json_string = json.dumps(row) json_string = json_string.strip() client.connect("", 1883, 180) client.publish("persistent://public/default/mqtt-2", payload=json_string,qos=0,retain=True)
  33. Web Sockets from Python pip3 install websocket-client import websocket, base64, json topic = 'ws://server:8080/ws/v2/producer/persistent/public/default/webtopic1' ws = websocket.create_connection(topic) message = "Hello Python Web Conference" message_bytes = message.encode('ascii') base64_bytes = base64.b64encode(message_bytes) base64_message = base64_bytes.decode('ascii') ws.send(json.dumps({'payload' : base64_message,'properties': {'device' : 'jetson2gb','protocol' : 'websockets'},'context' : 5})) response = json.loads(ws.recv())
  34. Kafka from Python pip3 install kafka-python from kafka import KafkaProducer from kafka.errors import KafkaError row = { } row['gasKO'] = str(readings) json_string = json.dumps(row) json_string = json_string.strip() producer = KafkaProducer(bootstrap_servers='pulsar1:9092',retries=3) producer.send('topic-kafka-1', json.dumps(row).encode('utf-8')) producer.flush()
  35. Pulsar IO Functions in Python
  36. Pulsar IO Functions in Python bin/pulsar-admin functions create --auto-ack true --py py/src/ --classname "sentiment.Chat" --inputs "persistent://public/default/chat" --log-topic "persistent://public/default/logs" --name Chat --output "persistent://public/default/chatresult"
  37. Pulsar IO Functions in Python from pulsar import Function import json class Chat(Function): def __init__(self): pass def process(self, input, context): logger = context.get_logger() msg_id = context.get_message_id() fields = json.loads(input)
  38. Python For Pulsar on Pi ● ● ● ● ● ● ● ● ●
  39. Connect with the Community ● Join the Pulsar Slack channel - ● Follow @streamnativeio and @apache_pulsar on Twitter ● Subscribe to Monthly Pulsar Newsletter for major news, events, project updates, and resources in the Pulsar community 40
  40. Let’s Keep in Touch Tim Spann Developer Advocate @PassDev
  41. Pulsar Subscription Modes Different subscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2, V 21 > < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  42. <or> Run in Docker docker run -it -p 6650:6650 -p 8080:8080 --mount source=pulsardata,target=/pulsar/data --mount source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:2.9.1 bin/pulsar standalone
  43. <or> Run in K8
  44. Using NVIDIA Jetson Devices With Pulsar e-part-1-of-3-nvidia-jetson-xavier-nx-595k new-nvidia.html 39b51a06180d98545c1e0542/python3/
  45. Other Applications and Code ● ● ● ● ● ● ● ● ● ●
  46. ### --- Kafka-on-Pulsar KoP (Example from standalone.conf) messagingProtocols=mqtt,kafka allowAutoTopicCreationType=partitioned kafkaListeners=PLAINTEXT:// brokerEntryMetadataInterceptors=org.apache.pulsar.common.intercept.AppendIndexMetad ataInterceptor brokerDeleteInactiveTopicsEnabled=false kopAllowedNamespaces=true requestTimeoutMs=60000 groupMaxSessionTimeoutMs=600000 ### --- Kafka-on-Pulsar KoP (end) 47
  47. Cleanup bin/pulsar-admin topics delete persistent://meetup/newjersey/first bin/pulsar-admin namespaces delete meetup/newjersey bin/pulsar-admin tenants delete meetup
  48. Messaging Ordering Guarantees To guarantee message ordering, architect Pulsar to take advantage of subscription modes and topic ordering guarantees. Topic Ordering Guarantees ● Messages sent to a single topic or partition DO have an ordering guarantee. ● Messages sent to different partitions DO NOT have an ordering guarantee. Subscription Mode Guarantees ● A single consumer can receive messages from the same partition in order using an exclusive or failover subscription mode. ● Multiple consumers can receive messages from the same key in order using the key_shared subscription mode.