PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
Developers want to develop real-time applications that can turn raw text data into analyzed sentences and smart sentiment? Have you heard of real-time analytics, let's leverage your Python skills to make it happen now.
In this talk I will show developers how to develop real-time applications that use Pulsar functions to turn live event data into live NLP results and sentiment analyzed output.
We will walk through how to setup various scenarios to feed live data from apps and webapps to Apache Pulsar for real-time analytics and NLP.
At the end of the talk, developers will be able to use Python for real-time NLP analytics and have gained insight on how and when to use various streaming protocols, platforms, libraries and systems including Apache Pulsar, Apache Kafka, MQTT, Websockets, AMQP and Apache Spark.
https://2023.pythonwebconf.com/presentations/apache-pulsar-development-202-with-python
2. 2
https://2023.pythonwebconf.com/presentations/apache-pulsar
-development-202-with-python
Developers want to develop real-time applications that can turn raw text data into analyzed sentences and smart sentiment? Have you heard
of real-time analytics, let's leverage your Python skills to make it happen now.
In this talk I will show developers how to develop real-time applications that use Pulsar functions to turn live event data into live NLP results
and sentiment analyzed output.
We will walk through how to setup various scenarios to feed live data from apps and webapps to Apache Pulsar for real-time analytics and
NLP.
At the end of the talk, developers will be able to use Python for real-time NLP analytics and have gained insight on how and when to use
various streaming protocols, platforms, libraries and systems including Apache Pulsar, Apache Kafka, MQTT, Websockets, AMQP and Apache
Spark.
3.
4. FLiPN-FLaNK Stack
Tim Spann
@PaasDev // Blog: www.datainmotion.dev
Principal Developer Advocate.
Princeton Future of Data Meetup.
ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC
https://github.com/tspannhw/EverythingApacheNiFi
https://medium.com/@tspann
Apache NiFi x Apache Kafka x Apache Flink
7. Messages - the Basic Unit of Apache Pulsar
7
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
8. Integrated Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2
(value=Avro/Protobuf/JSON)
schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
8
11. Data Offloaders
(Tiered Storage)
Client Libraries
Apache Pulsar Ecosystem
hub.streamnative.io
Connectors
(Sources & Sinks)
Protocol Handlers
Pulsar Functions
(Lightweight Stream
Processing)
Processing Engines
… and more!
… and more!
12. 12
Pulsar Functions
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries to
support the execution of ML
models on the edge.
13. from pulsar import Function
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
fields = json.loads(input)
sid = SentimentIntensityAnalyzer()
ss = sid.polarity_scores(fields["comment"])
row = { }
row['id'] = str(msg_id)
if ss['compound'] < 0.00:
row['sentiment'] = 'Negative'
else:
row['sentiment'] = 'Positive'
row['comment'] = str(fields["comment"])
json_string = json.dumps(row)
return json_string
Entire Function
ML Function
13
14. Install Python Pulsar Client
pip3 install pulsar-client=='2.11.0[all]'
# Depending on Platform May Need C++ Client Built
For Python on Pulsar on Pi https://github.com/tspannhw/PulsarOnRaspberryPi
https://pulsar.apache.org/docs/en/client-libraries-python/
14
https://github.com/tspannhw/pulsar-sentiment-function
15. Building a Python Producer
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
producer
client.create_producer('persistent://conf/ete/first')
producer.send(('Simple Text Message').encode('utf-8'))
client.close()
15
19. import pulsar
client = pulsar.Client('pulsar://localhost:6650')
consumer =
client.subscribe('persistent://conf/ete/first',subscription_name='mine')
while True:
msg = consumer.receive()
print("Received message: '%s'" % msg.data())
consumer.acknowledge(msg)
client.close()
Building a Python Consumer
19
20. from pyscylla.cluster import Cluster
from pyscylla.session import Session
from pyscylla.schema import Model, columns
from pyscylla.query import EQ
from pyschema import Schema, fields
import json
class User(Model):
id = columns.UUID(primary_key=True)
name = columns.Text()
email = columns.Text()
class UserSchema(Schema):
id = fields.UUID()
name = fields.String()
email = fields.String()
ChatGPT Built a Pulsar Function
20
21. class Function(pulsar_function.PulsarFunction):
def process(self, json_message, context):
# Connect to the Scylla database
cluster = Cluster(["127.0.0.1"])
session = cluster.connect()
# Parse the JSON message using the JSON schema
user_json = json.loads(json_message)
user_data = UserSchema().loads(user_json)
# Look up the user in the Scylla database
user = User.objects(session).get(id=EQ(user_data["id"]))
# Update the JSON object with the user data from the database
user_json["name"] = user.name
user_json["email"] = user.email
return json.dumps(user_json)
ChatGPT Built a Pulsar Function 2
21