SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
Adapting to Workloads Characteristics
General stream slicing automatically adapts to workload characteristics
to improve window aggregation performance without sacrificing
general applicability.
We make the following contributions:
1. We identify the workload characteristics which impact the
applicability and performance of window aggregation techniques.
2. We contribute general stream slicing, a generally applicable and
highly efficient solution for streaming window aggregation.
3. We analyze the implications of workload characteristics and show
that stream slicing is generally applicable while offering better
performance than existing approaches
Efficient Window Aggregation with
General Stream Slicing
Jonas Traub
jonas.traub@tu-berlin.de
Philipp M. Grulich
philipp.grulich@campus.tu-berlin.de
Alejandro Rodríguez Cuéllar
alejandro.rodriguez88@gmail.com
Sebastian Breß
sebastian.bress@dfki.de
Abstract and Contributions
Architecture of General Stream Slicing
22nd International Conference on Extending Database Technology (EDBT), March 26-29, 2019, Lisbon, Portugal
Asterios Katsifodimos
a.katsifodimos@tudelft.nl
Tilmann Rabl
rabl@tu-berlin.de
Volker Markl
volker.markl@tu-berlin.de
Aggregation Techniques
Keep individual tuples?
Workload Characteristics
Are splits required? How to remove tuples?
Performance Evaluation
Key findings
• General slicing outperform
alternative concepts with
respect to throughput and
scales to large numbers of
concurrent windows.
Impact of Out-Of-Order TuplesScalability (20% out-of-order) Impact of Aggregation Functions (20% out-of-order)
Tuple centric Slice centric
• Stream slicing and Buckets scale
with constant throughput to
large fractions of out-of-order
tuples and are robust against
high delays of these tuples.
• On time-based windows, stream slicing performs diverse
distributive and algebraic aggregations with similarly high
throughputs. Considering count-based windows and out-
of-order tuples, invertible aggregations lead to higher
throughputs than not invertible ones.
Open Source Repository:
tu-berlin-dima.github.io/scotty-window-processor

Contenu connexe

Plus de Jonas Traub

Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingFlink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Jonas Traub
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Jonas Traub
 
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
Jonas Traub
 

Plus de Jonas Traub (10)

Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingFlink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
 
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCL
 
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
 
I²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataI²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming Data
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Efficient Window Aggregation with General Stream Slicing (Poster for EDBT Best Paper 2019)

  • 1. Adapting to Workloads Characteristics General stream slicing automatically adapts to workload characteristics to improve window aggregation performance without sacrificing general applicability. We make the following contributions: 1. We identify the workload characteristics which impact the applicability and performance of window aggregation techniques. 2. We contribute general stream slicing, a generally applicable and highly efficient solution for streaming window aggregation. 3. We analyze the implications of workload characteristics and show that stream slicing is generally applicable while offering better performance than existing approaches Efficient Window Aggregation with General Stream Slicing Jonas Traub jonas.traub@tu-berlin.de Philipp M. Grulich philipp.grulich@campus.tu-berlin.de Alejandro Rodríguez Cuéllar alejandro.rodriguez88@gmail.com Sebastian Breß sebastian.bress@dfki.de Abstract and Contributions Architecture of General Stream Slicing 22nd International Conference on Extending Database Technology (EDBT), March 26-29, 2019, Lisbon, Portugal Asterios Katsifodimos a.katsifodimos@tudelft.nl Tilmann Rabl rabl@tu-berlin.de Volker Markl volker.markl@tu-berlin.de Aggregation Techniques Keep individual tuples? Workload Characteristics Are splits required? How to remove tuples? Performance Evaluation Key findings • General slicing outperform alternative concepts with respect to throughput and scales to large numbers of concurrent windows. Impact of Out-Of-Order TuplesScalability (20% out-of-order) Impact of Aggregation Functions (20% out-of-order) Tuple centric Slice centric • Stream slicing and Buckets scale with constant throughput to large fractions of out-of-order tuples and are robust against high delays of these tuples. • On time-based windows, stream slicing performs diverse distributive and algebraic aggregations with similarly high throughputs. Considering count-based windows and out- of-order tuples, invertible aggregations lead to higher throughputs than not invertible ones. Open Source Repository: tu-berlin-dima.github.io/scotty-window-processor