The document discusses reducing redundant Apache Pulsar producers from partitioned topics. It presents a solution that limits the number of internal producers per partitioned producer and lazily loads producers. Benchmark results show the proposed approach reduces client-side resource usage like heap and number of TCP connections compared to the existing approach, while having little impact on broker-side resources. The conclusion is that implementing these producer changes can improve efficiency.
I’m glad to attend great summit as a speaker. Now, let’s start to talk about “Reduce redundant producers from partitioned producer”.
(If we have much time, add demonstration step)
Here is today’s agenda. First, I will talk about background of the issue about producer connections. Second is the solution of the issue. Third is benchmarking about the solution. And last is conclusion.
Let’s start to talk about the background of the issue.
In Yahoo! JAPAN, Apache Pulsar is used in many use cases. For example, notification of contents update, job queuing, etc.
Also, Pulsar is used in metrics and logs streaming pipeline.
Both metrics and logs is sent to the topic from computing instances such as IaaS, PaaS, CaaS, etc. These are received by metrics or logging platform.
In this case, an unspecified number of producers connect to partitioned topic from computing instances.
It causes these issues. First, the number of producers exceeds the limit. Second, redundant producers are created per computing instances.
I’ll explain about first issue. Pulsar has a config maxProducersPerTopic. "ill-behaved" clients which increase producers infinitely can make the topic producer-full.
To solve this issue, introduce the config to restrict number of producers and consumers for each IP address. We don’t explain the detail in this session because it is already merged. If interested, please check this link.
I’ll explain about second issue. When a producer connects to the partitioned topic, sometimes it has redundant internal producers. Some cases as below.
First, relatively "low-rate" producers. The number of partitions needs to be increased according to total throughput. However, creating internal producers for all partitions is inefficient for producers whose throughput is small enough to be handled by a few partitions.
Second, using single partition routing mode. A partitioned producer creates internal producer for all partitions. In this case, each producer use only one partition. Therefore, other internal producers are redundant.
In this session, I will talk about second one.
Now, let’s talk about the solution.
Here is the concept of solving the issue. Reduce the number of producers to use system resources more efficiently. As you can see, each partitioned producer connects to part of partitions.
When a partitioned producer connects to the topic, the client can randomly choose the limiting number of internal producer and internal partitions as well.
To implement, I introduce producer lazy-loading feature and custom routing mode. From now, I will explain detailed solutions.
First, at initialization step, a partitioned producer connects to only one of partitions for authentication and authorization instead of all partitions. When the internal producer is created, validate authentication and validate authorization at this topic.
Second, at message sending step, partitions are chosen by message router. Each internal producer is created on the first time to be chosen by message router.
Therefore, number of internal producer depends on message routing policy.
Also, we add new custom routing mode PartialRoundRobinMessageRouter. This mode supports round-robin with limiting number of partitions. Also, when producer creates by SinglePartition routing mode, then creates only one internal producer.
In previous implementation, a partitioned producer could connect to part of partition. It causes another issue about partitioned producer stats.
Partitioned producer stats are accumulated for all partitions. Accumulating procedure supposes “all partitions have same producer”.
Therefore, we couldn’t get correct stats by partial producer like right one. We would like to get correct stats not only total producer but also partial producer.
To solve this issue, we introduce producerStatsKey. Publisher stats with the same value for this property are accumulated as same producer.
Also add behavior that a partitioned producer sets same producerStatsKey to internal producer.
By this feature, we could get correct partitioned producer stats.
Next, I will talk about benchmarking result with and without this feature by toy example.
Now, let’s talk about the benchmarking.
Criterion of this benchmark is here.
Assumption of this benchmark is here.
Procedure and variables of this benchmark is here.
Environment of this benchmark is here.
In this benchmarking, each brokers and client are run as a process on single laptop.
Result of b=5, l=3, proposed is here.
and here.
Result of b=5, l=3, existing is here.
and here.
Latency of b=5, l=3 is here.
Result of b=5, l=5, proposed is here.
and here.
Latency of b=5, l=5 is here.
For existing side, conditions are equal to b=5, l=3. So, reuse the same result.
Result of b=10, l=5, proposed is here.
and here.
Result of b=10, l=5, existing is here.
and here.
Latency of b=10, l=5 is here.
Now, consider about the result. For client side, the result suggests these behavior. Particularly, number of TCP connections is less than or equal to “existing”. If we use the feature, number of TCP connections is less than or equal to “limit+1”. That is trivial.
Here is a part of result. Please look at b=5, l=3 row. As you can see, both number of TCP connections and producer initialization time are less than “existing”.
For broker side, the result suggests these behavior. Particularly, number of brokers which these values were increased in is less than “existing”. Probably because number of running brokers is depends on active topics.
Moreover, CPU percentage per broker is greater than “existing” in some brokers. Maybe because topic load isn’t distributed to all brokers.
Here is a part of result. Please look at right one. This matrix shows the number of topics which is loaded by broker. As you can see, proposed one is not completely distributed. The ratio is 1 vs 2. In contrast, existing one is distributed.
It causes load and system resource bias between brokers. Therefore, before using this feature we should take care the config about limiting number of partitions or implement smart message router like considering current load.
Next, I will talk about conclusion.
In conclusion, I talked about implementation of producer lazy-loading and partial round-robin feature and its performance by toy example.
The one of future tasks is to implement the feature to other clients such as C++, Go, etc.
I’m really excited to be involved in Apache Pulsar.
Thank you for your attention. My talk is all finished.