Development of a Distributed Stream Processing System (DSPS) in node.js and ZeroMQ and demonstration of an application of trending topics with a dataset from Twitter.
What's New in Teams Calling, Meetings and Devices March 2024
Development of a Distributed Stream Processing System
1. Development of a Distributed
Stream Processing System
Maycon Viana Bordin
Final Assignment
Instituto de Informática
Universidade Federal do Rio Grande do Sul
CMP157 – PDP 2013/2, Claudio Geyer
71. Test Environment
GridRS - PUCRS
3 nodes
4 x 3.52 GHz (Intel Xeon)
2 GB RAM
Linux 2.6.32-5-amd64
Gigabit Ethernet
72. Metrics
Runtime
Latency: time to a tuple traverse the graph
Throughput: no. of tuples processed per sec.
Loss of Tuples
Methodology
5 runs per test.
Every 3s each operator sends its status with
no. of tuples processed.
The PerfMon sink collects a tuple every
100ms, and sends the average latency every
3s (and cleans up the collected tuples).
Variables
Number of nodes
Number of operator instances
Window size
88. References
Chakravarthy, Sharma. Stream data processing: a quality of
service perspective: modeling, scheduling, load shedding, and
complex event processing. Vol. 36. Springer, 2009.
Cormode, Graham, and S. Muthukrishnan. "An improved data
stream summary: the count-min sketch and its applications."
Journal of Algorithms 55.1 (2005): 58-75.
Gulisano, Vincenzo Massimiliano, Ricardo Jiménez Peris, and
Patrick Valduriez. StreamCloud: An Elastic Parallel-Distributed
Stream Processing Engine. Diss. Informatica, 2012.
Source code @ github.com/mayconbordin/tempest