SlideShare une entreprise Scribd logo
1  sur  33
MonetDB/DataCell

                   Exploiting the Power of Relational
                     Databases for Efficient Stream
                               Processing

                                        CWI
                             Project Meeting@Innsbruck
                               Feb 28 - Mar 04, 2011




Wednesday, March 02, 2011
DBMS versus DSMS
                                                                            1

                                                         2
                                        One-time query
                                                                                Incoming data

                                                                 DB
                                                answer
                                            4
   1    Store incoming tuples
   2    Submit one-time query                                3

   3    Query processing on the already stored data
   4    Create answer                                                 Disk storage




Wednesday, March 02, 2011
DBMS versus DSMS
                                                                                        1

                                                             2
                                          One-time query
                                                                                              Incoming data

                                                                         DB
                                                   answer
                                               4
   1    Store incoming tuples
   2    Submit one-time query                                     3

   3    Query processing on the already stored data
   4    Create answer                                                             Disk storage


                                      4                      3
                                                                                                  2



                                                                                                     Input stream
                                                      Continuous queries
                                    notification                              1
                                                                                            Memory
   1    Submit continuous queries
   2    Incoming streams
                                                                                    A data stream is a never
   3    Input stream is processed on the fly                                        ending sequence of tuples
   4    The produced results are continuously delivered to the clients

Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                                                                          www
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
Observation
   • Nowadays stream systems are built from scratch

   • Redesign operators and optimizations

  • Relational Databases are considered inefficient and too complex

   • Modern stream applications require both management of
      stored and streaming data




Wednesday, March 02, 2011
Goals
   • We design the DataCell on top of an existing DataBase Kernel

   • Exploit database techniques, query optimization and operators

   • Provide full language functionalities (SQL’03)

   • Research questions
      • is it viable?
      • multi-query processing/scheduling
      • real-time processing



Wednesday, March 02, 2011
The Basic Idea of DataCell
      • Stream tuples are first stored in (appended to) baskets.

      • We evaluate the continuous queries over the baskets.
             Instead of throwing each incoming tuple against the waiting queries (Data Streams)
                              tuple

                                      Query
                                       Set



             first collect the data and then throw the queries against the tuples (DataBase)

                            tuple      Query
                                        Set



      • Once a tuple is seen, it is dropped from its basket.


Wednesday, March 02, 2011
The MonetDB/DataCell stack
                                    SQL Query

                              SQL



                              Query parser



                            Query Optimizer




                             MAL


                             MAL Interpreter


                                    Query Executor




Wednesday, March 02, 2011
The MonetDB/DataCell stack
                                        SQL Query

                                  SQL



                                   Query parser + CQ



                                Query Optimizer + DC opt


                            Continuous Query Scheduler

                                  MAL


                                 MAL Interpreter


                                        Query Executor




Wednesday, March 02, 2011
DataCell Components
                            Receptor   <=>   Listens to a stream


                            Emitter    <=>   Delivers events to the clients


                            Factory    <=>   Continuous query


                            Basket     <=>   Holds events


        Input Stream                                          Output Stream
                                R            Q            E


Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                       id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                       id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                       id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                       id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler        SPARQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                         id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories

           Tumbling window
           Q1: Select * From [Select * from X top 3] as S where S.a>10;

           Sliding window
           Q2:      SELECT * FROM (
                    [Select * From X top 1]
                     Union
                     Select * From X top 2 offset 1) as S
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                            100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                    [Select * From X top 1]
                     Union
                     Select * From X top 2 offset 1) as S
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                            100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                    [Select * From X top 1]
                     Union
                     Select * From X top 2 offset 1) as S
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                                         12
                                                                            100          100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                    [Select * From X top 1]
                     Union
                     Select * From X top 2 offset 1) as S
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                                         12
                                                                            100          100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                                                                            12
                    [Select * From X top 1]                                 3
                     Union                                                          Q2
                                                                            100
                     Select * From X top 2 offset 1) as S
                                                                            14
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                                         12
                                                                            100          100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                                                                            12
                    [Select * From X top 1]                                 3            12
                     Union                                                          Q2
                                                                            100          100
                     Select * From X top 2 offset 1) as S
                                                                            14
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                                         12
                                                                            100          100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                                                                            12
                    [Select * From X top 1]                                 3            12
                     Union                                                          Q2
                                                                            100          100
                     Select * From X top 2 offset 1) as S
                                                                            14
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Query processing strategies
            Separate Baskets

     • Each continuous query is encapsulated within a single factory
     • Each factory f has it own input baskets, that are accessed only by f
     • If more than one factory are interested for the same data, we create
          multiple copies of this data

     • Factories are completely independent
     • Exploit column-store to minimize the overhead of replication
                                          bcopy1
                                                   Q1

                            b             bcopy2
                                  Qcopy            Q2


                                          bcopy3
                                                   Q3

Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                        Q1

                    b

                                        Q2




                                        Q3




Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                   FL1   Q1

                    b

                            Lock   FL2   Q2




                                   FL3   Q3




Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                   FL1   Q1     FU1
                    b

                            Lock   FL2   Q2     FU2



                                   FL3   Q3     FU3


Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                   FL1   Q1     FU1
                    b

                            Lock   FL2   Q2     FU2       Unlock




                                   FL3   Q3     FU3


Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                   FL1   Q1     FU1
                    b

                            Lock   FL2   Q2     FU2       Unlock




                                   FL3   Q3     FU3


Wednesday, March 02, 2011
Summary




                            +   =   DataCell




Wednesday, March 02, 2011

Contenu connexe

Plus de PlanetData Network of Excellence

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoPlanetData Network of Excellence
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksPlanetData Network of Excellence
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingPlanetData Network of Excellence
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamPlanetData Network of Excellence
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingPlanetData Network of Excellence
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...PlanetData Network of Excellence
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchPlanetData Network of Excellence
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSPlanetData Network of Excellence
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReducePlanetData Network of Excellence
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...PlanetData Network of Excellence
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsPlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...PlanetData Network of Excellence
 

Plus de PlanetData Network of Excellence (20)

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 

Dernier

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 

Dernier (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

MonetDB/DataCell - Exploiting the Power of Relational Databases for Efficient Stream Processing

  • 1. MonetDB/DataCell Exploiting the Power of Relational Databases for Efficient Stream Processing CWI Project Meeting@Innsbruck Feb 28 - Mar 04, 2011 Wednesday, March 02, 2011
  • 2. DBMS versus DSMS 1 2 One-time query Incoming data DB answer 4 1 Store incoming tuples 2 Submit one-time query 3 3 Query processing on the already stored data 4 Create answer Disk storage Wednesday, March 02, 2011
  • 3. DBMS versus DSMS 1 2 One-time query Incoming data DB answer 4 1 Store incoming tuples 2 Submit one-time query 3 3 Query processing on the already stored data 4 Create answer Disk storage 4 3 2 Input stream Continuous queries notification 1 Memory 1 Submit continuous queries 2 Incoming streams A data stream is a never 3 Input stream is processed on the fly ending sequence of tuples 4 The produced results are continuously delivered to the clients Wednesday, March 02, 2011
  • 4. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 5. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 6. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 7. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 8. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples www q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 9. Observation • Nowadays stream systems are built from scratch • Redesign operators and optimizations • Relational Databases are considered inefficient and too complex • Modern stream applications require both management of stored and streaming data Wednesday, March 02, 2011
  • 10. Goals • We design the DataCell on top of an existing DataBase Kernel • Exploit database techniques, query optimization and operators • Provide full language functionalities (SQL’03) • Research questions • is it viable? • multi-query processing/scheduling • real-time processing Wednesday, March 02, 2011
  • 11. The Basic Idea of DataCell • Stream tuples are first stored in (appended to) baskets. • We evaluate the continuous queries over the baskets. Instead of throwing each incoming tuple against the waiting queries (Data Streams) tuple Query Set first collect the data and then throw the queries against the tuples (DataBase) tuple Query Set • Once a tuple is seen, it is dropped from its basket. Wednesday, March 02, 2011
  • 12. The MonetDB/DataCell stack SQL Query SQL Query parser Query Optimizer MAL MAL Interpreter Query Executor Wednesday, March 02, 2011
  • 13. The MonetDB/DataCell stack SQL Query SQL Query parser + CQ Query Optimizer + DC opt Continuous Query Scheduler MAL MAL Interpreter Query Executor Wednesday, March 02, 2011
  • 14. DataCell Components Receptor <=> Listens to a stream Emitter <=> Delivers events to the clients Factory <=> Continuous query Basket <=> Holds events Input Stream Output Stream R Q E Wednesday, March 02, 2011
  • 15. DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 16. DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 17. DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 18. DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 19. DataCell Architecture SQL Compiler SPARQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 20. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories Tumbling window Q1: Select * From [Select * from X top 3] as S where S.a>10; Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 21. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 22. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 23. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 24. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( 12 [Select * From X top 1] 3 Union Q2 100 Select * From X top 2 offset 1) as S 14 WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 25. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( 12 [Select * From X top 1] 3 12 Union Q2 100 100 Select * From X top 2 offset 1) as S 14 WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 26. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( 12 [Select * From X top 1] 3 12 Union Q2 100 100 Select * From X top 2 offset 1) as S 14 WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 27. Query processing strategies Separate Baskets • Each continuous query is encapsulated within a single factory • Each factory f has it own input baskets, that are accessed only by f • If more than one factory are interested for the same data, we create multiple copies of this data • Factories are completely independent • Exploit column-store to minimize the overhead of replication bcopy1 Q1 b bcopy2 Qcopy Q2 bcopy3 Q3 Wednesday, March 02, 2011
  • 28. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker Q1 b Q2 Q3 Wednesday, March 02, 2011
  • 29. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 b Lock FL2 Q2 FL3 Q3 Wednesday, March 02, 2011
  • 30. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 FU1 b Lock FL2 Q2 FU2 FL3 Q3 FU3 Wednesday, March 02, 2011
  • 31. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 FU1 b Lock FL2 Q2 FU2 Unlock FL3 Q3 FU3 Wednesday, March 02, 2011
  • 32. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 FU1 b Lock FL2 Q2 FU2 Unlock FL3 Q3 FU3 Wednesday, March 02, 2011
  • 33. Summary + = DataCell Wednesday, March 02, 2011