SlideShare a Scribd company logo
1 of 28
Astronomical Data Processing Using
   SciQL, an SQL Based Query
     Language for Array Data



 Ying Zhang, Bart Scheers, Martin Kersten, Milena Ivanova, Niels Nes
                          CWI Amsterdam

             ADASS XXI, Nov. 06-10, 2011, Paris, France

                                       !"#$%&'()*+,#-&$.#/(012#&+$#%3$%#,(
                                       2.#(4&#$5()*+,#-&$".1(6&$&



                                       !"#$%&'()&"#*+,-(     ./0/123
                                       4")*'()5"%%,%*'(*#-(( 6!7(8 9:7;;9
Why Not RDBMS?

             SQL is difficult

                No appropriate array denotations

                No functional complete operation set

             DBMSs are slow

                Too much overhead

                Size limitations (due to BLOB representations)

                Existing foreign files

                Scale

                ...




2011-11-09                               ADASS XXI                  3
SciQL
             An array query language based on SQL:2003

             To lower the entrance fee to RDBMSs



             Distinguish features (Kersten et al. AD ’11; Zhang et al.
             IDEAS2011):

                Arrays and tables as first class citizens of DBMSs

                Seamless integration of relational and array paradigms

                Named dimensions with constraints

                Flexible structure-based grouping




             LOFAR Transient Key Project use case

2011-11-09                                ADASS XXI                          4
Array Definitions
                        y               null

                    3       0.0   0.0          0.0     0.0
                    2       0.0   0.0          0.0     0.0
             null                                                null
                    1       0.0   0.0          0.0     0.0
                    0       0.0   0.0          0.0     0.0
                                                             x
                            0     1            2       3
                                        null

             CREATE ARRAY A1 (
              x INT DIMENSION [0:1:4], y INT DIMENSION [0:1:4],
              v FLOAT DEFAULT 0.0);




2011-11-09                        ADASS XXI                             5
Array Definitions
                        y                   null

                    3       0.0       0.0          0.0     0.0
                    2       0.0       0.0          0.0     0.0
             null                                                    null
                    1       0.0       0.0          0.0     0.0
                    0       0.0       0.0          0.0     0.0
                                                                 x
                            0          1           2       3
                                            null

             CREATE ARRAY A1 (
              x INT DIMENSION [0:1:4], y INT DIMENSION [0:1:4],
              v FLOAT DEFAULT 0.0);


                                      dimensions,
                                  any scalar data type



2011-11-09                             ADASS XXI                            5
Array Definitions
                        y                    null

                    3       0.0        0.0          0.0         0.0
                    2       0.0        0.0          0.0         0.0
             null                                                         null
                    1       0.0        0.0          0.0         0.0
                    0       0.0        0.0          0.0         0.0
                                                                      x
                            0            1           2          3
                                             null

             CREATE ARRAY A1 (
              x INT DIMENSION [0:1:4], y INT DIMENSION [0:1:4],
              v FLOAT DEFAULT 0.0);

                            dimensional range:
                            [(start|∗) : (step|∗) : (stop|∗)]




2011-11-09                              ADASS XXI                                6
Array Definitions
                        y                null

                    3       0.0   0.0           0.0     0.0
                    2       0.0   0.0           0.0     0.0
             null                                                 null
                    1       0.0   0.0           0.0     0.0
                    0       0.0   0.0           0.0     0.0
                                                              x
                            0        1           2      3
                                         null

             CREATE ARRAY A1 (
              x INT DIMENSION [0:1:4], y INT DIMENSION [0:1:4],
              v FLOAT DEFAULT 0.0);




                   cell values,
              any column data type


2011-11-09                           ADASS XXI                           7
Array Tiling
                SELECT [x], [y], AVG(v) FROM A1
                GROUP BY A1[x:x+2][y:y+2];


                        y              null

                    3       0.0   0.0         0.0   0.0

                    2       0.0   0.0         0.0   0.0
             null                                         null
                    1       0.0   0.5         0.5   0.5

                    0       0.0   0.0         0.0   0.0
                             0     1           2     3
                                                           x
                                       null




2011-11-09                        ADASS XXI                                 8
Array Tiling
                      SELECT [x], [y], AVG(v) FROM A1
                      GROUP BY A1[x:x+2][y:y+2];


                              y              null

                          3       0.0   0.0         0.0   0.0


   Anchor point:          2       0.0   0.0         0.0   0.0
     A1[x][y]      null                                         null
                          1       0.0   0.5         0.5   0.5

                          0       0.0   0.0         0.0   0.0
                                   0     1           2     3
                                                                 x
                                             null




2011-11-09                              ADASS XXI                                 8
Array Tiling
                      SELECT [x], [y], AVG(v) FROM A1
                      GROUP BY A1[x:x+2][y:y+2];


                              y              null

                          3       0.0   0.0         0.0   0.0


   Anchor point:          2       0.0   0.0         0.0   0.0
     A1[x][y]      null                                         null
                          1       0.0   0.5         0.5   0.5

                          0       0.0   0.0         0.0   0.0
                                   0     1           2     3
                                                                 x
                                             null




2011-11-09                              ADASS XXI                                 8
Array Tiling
                      SELECT [x], [y], AVG(v) FROM A1
                      GROUP BY A1[x:x+2][y:y+2];


                              y              null

                          3       0.0   0.0         0.0   0.0


   Anchor point:          2       0.0   0.0         0.0   0.0
     A1[x][y]      null                                         null
                          1       0.0   0.5         0.5   0.5

                          0       0.0   0.0         0.0   0.0
                                   0     1           2     3
                                                                 x
                                             null




2011-11-09                              ADASS XXI                                 8
Array Tiling
                      SELECT [x], [y], AVG(v) FROM A1
                      GROUP BY A1[x:x+2][y:y+2];


                              y              null

                          3       0.0   0.0         0.0   0.0


   Anchor point:          2       0.0   0.0         0.0   0.0
     A1[x][y]      null                                         null
                          1       0.0   0.5         0.5   0.5

                          0       0.0   0.0         0.0   0.0
                                   0     1           2     3
                                                                 x
                                             null




2011-11-09                              ADASS XXI                                 8
Array Tiling
                      SELECT [x], [y], AVG(v) FROM A1
                      GROUP BY A1[x:x+2][y:y+2];


                              y              null

                          3       0.0   0.0         0.0   0.0


   Anchor point:          2       0.0   0.0         0.0   0.0
     A1[x][y]      null                                         null
                          1       0.0   0.5         0.5   0.5

                          0       0.0   0.0         0.0   0.0
                                   0     1           2     3
                                                                 x
                                             null




2011-11-09                              ADASS XXI                                 8
Array Tiling
                SELECT [x], [y], AVG(v) FROM A1
                GROUP BY A1[x:x+2][y:y+2];


                        y               null

                    3        0.0   0.0         0.0   0.0

                    2        0.0   0.0         0.0   0.0
             null                                          null
                    1       0.125 0.25 0.25 0.25

                    0       0.125 0.25 0.25 0.25
                              0     1           2     3
                                                            x
                                        null




2011-11-09                         ADASS XXI                                 9
LOFAR Catalogue
                                                                                                         ra DOUBLE,
  zone (Gray et al. 2006)                                frequency                                       decl DOUBLE,
                                                                                                         ra_err DOUBLE,
  90                                                                                                     decl_err DOUBLE,
                                                         ...                                             flux DOUBLE,
   ...                                                                                                   ...
                                                         ν4
   2
                                                         ν3
    1
                                                         ν2                                          V
    0                                                                                            U
                                                         ν1                                  Q
   -1                                                                                    I
   -2                                                          t1   t2   t3   t4   ...           time
   ...
  -90

          0   1   2   3   ...   357   358   359   meridian



         CREATE ARRAY LOFARsrc (
           zone INT DIMENSION[-90:1:91], mrdn INT DIMENSION[0:1:360],
           ts   TIMESTAMP DIMENSION,     freq INT DIMENSION[30:10:241],
           id   INT DIMENSION[0:1:*],    stks CHAR(1) DIMENSION
                   CHECK(stks=`I' OR stks=`Q' OR stks=`U' OR stks=`V'),
           ra DOUBLE, decl DOUBLE, ra_err DOUBLE, decl_err DOUBLE,
           flux DOUBLE, ...);

2011-11-09                                               ADASS XXI                                                          10
LOFAR Use Case
                                                                                                          ra DOUBLE,
  zone (Gray et al. 2006)                                 frequency                                       decl DOUBLE,
                                                                                                          ra_err DOUBLE,
  90                                                                                                      decl_err DOUBLE,
                                                          ...                                             flux DOUBLE,
   ...                                                                                                    ...
                                                          ν4
   2
                                                          ν3
    1
                                                          ν2                                          V
    0                                                                                             U
                                                          ν1                                  Q
   -1                                                                                     I
   -2                                                           t1   t2   t3   t4   ...           time
   ...
  -90

         0    1   2   3    ...   357   358   359   meridian



             Similarity of the flux of a LOFAR source at frequencies 30
             MHz and 200 MHz

                          cross-correlation of two time series


2011-11-09                                                ADASS XXI                                                          11
Cross-Correlation

        idx       0         1        2        3
  F
        val       4         3        6        2

                                              idx        0       1       2
                                     G
                                              val        1       5       7




                      idx       -3       -2         -1       0       1       2
             Cr
                      val




2011-11-09                                               ADASS XXI                          12
Cross-Correlation
                                                                                      Cr.idx = -3

                  idx     0        1        2          3
         F                                                                            F [3 : 4]
                  val     4        3        6          2

                                            idx        0       1       2              G [0 : 1]
                                   G
                                            val        1       5       7




                    idx       -3       -2         -1       0       1       2
             Cr
                    val       2




2011-11-09                                             ADASS XXI                                  13
Cross-Correlation
                                                                                          Cr.idx = -2

                            idx        0        1          2       3
                  F                                                                       F [2 : 4]
                            val        4        3          6       2

                                                idx        0       1       2              G [0 : 2]
                                       G
                                                val        1       5       7




                      idx         -3       -2         -1       0       1       2
             Cr
                      val         2        16




2011-11-09                                                 ADASS XXI                                  14
Cross-Correlation
                                                                                      Cr.idx = -1

                                 idx        0          1       2       3
                        F                                                             F [1 : 4]
                                 val        4          3       6       2

                                            idx        0       1       2              G [0 : 3]
                                 G
                                            val        1       5       7




                  idx       -3         -2         -1       0       1       2
             Cr
                  val       2        16         47




2011-11-09                                             ADASS XXI                                  15
Cross-Correlation
                                                                                 Cr.idx = 0

                                      idx        0        1       2       3
                             F                                                   F [0 : 3]
                                      val        4        3       6       2

                                      idx        0        1       2              G [0 : 3]
                             G
                                      val        1        5       7




                  idx   -3       -2         -1       0        1       2
             Cr
                  val   2        16     47           61




2011-11-09                                       ADASS XXI                                   16
Cross-Correlation
                                                                                    Cr.idx = 1

                                                 idx       0        1       2   3
                                      F                                             F [0 : 2]
                                                 val       4        3       6   2

                                      idx        0         1        2               G [1 : 3]
                             G
                                      val        1         5        7




                  idx   -3       -2         -1         0       1        2
             Cr
                  val   2        16       47         61        41




2011-11-09                                       ADASS XXI                                      17
Cross-Correlation
                                                                                         Cr.idx = 2

                                                          idx       0        1   2   3
                                                 F                                       F [0 : 1]
                                                          val       4        3   6   2

                                      idx        0        1         2                    G [2 : 3]
                             G
                                      val        1        5         7




                  idx   -3       -2         -1       0          1       2
             Cr
                  val   2        16     47           61       41        28




2011-11-09                                       ADASS XXI                                           18
LOFAR Use Case
                                                                                                           ra DOUBLE,
  zone (Gray et al. 2006)                                  frequency                                       decl DOUBLE,
                                                                                                           ra_err DOUBLE,
  90                                                                                                       decl_err DOUBLE,
                                                           ...                                             flux DOUBLE,
   ...                                                                                                     ...
                                                           ν4
   2
                                                           ν3
    1
                                                           ν2                                          V
    0                                                                                              U
                                                           ν1                                  Q
   -1                                                                                      I
   -2                                                            t1   t2   t3   t4   ...           time
   ...
  -90

            0   1   2   3   ...   357   358   359   meridian
         DECLARE fcnt INT, gcnt INT;
         SET fcnt = SELECT COUNT(*) FROM LOFARsrc[*][*][*][30][11][‘I’];
         SET gcnt = SELECT COUNT(*) FROM LOFARsrc[*][*][*][200][11][‘I’];

         CREATE ARRAY VIEW F (idx INT DIMENSION[0:1:fcnt], flux DOUBLE DEFAULT 0.0) AS SELECT flux FROM
            LOFARsrc[*][*][*][30][11][‘I’];
         CREATE ARRAY VIEW G (idx INT DIMENSION[0:1:gcnt], val DOUBLE DEFAULT 0.0) AS SELECT flux FROM
            LOFARsrc[*][*][*][200][11][‘I’];

         CREATE ARRAY CrCorr30_200 (idx INT DIMENSION[-fcnt+1:1:gcnt], val DOUBLE DEFAULT 0.0);
         INSERT INTO CrCorr SELECT SUM(F.flux * G.flux) FROM F, G, CrCorr30_200 AS C
           GROUP BY F[MAX(0, -C.idx) : MIN(fcnt, gcnt-C.idx)], G[MAX(0, C.idx) : MIN(gcnt, fcnt+C.idx)];



2011-11-09                                                 ADASS XXI                                                          19
LOFAR Use Case
                                                                                                           ra DOUBLE,
  zone (Gray et al. 2006)                                  frequency                                       decl DOUBLE,
                                                                                                           ra_err DOUBLE,
  90                                                                                                       decl_err DOUBLE,
                                                           ...                                             flux DOUBLE,
   ...                                                                                                     ...
                                                           ν4
   2
                                                           ν3
    1
                                                           ν2                                          V
    0                                                                                              U
                                                           ν1                                  Q
   -1                                                                                      I
   -2                                                            t1   t2   t3   t4   ...           time
   ...
  -90

            0   1   2   3   ...   357   358   359   meridian                           retrieve the time series

         DECLARE fcnt INT, gcnt INT;
         SET fcnt = SELECT COUNT(*) FROM LOFARsrc[*][*][*][30][11][‘I’];
         SET gcnt = SELECT COUNT(*) FROM LOFARsrc[*][*][*][200][11][‘I’];

         CREATE ARRAY VIEW F (idx INT DIMENSION[0:1:fcnt], flux DOUBLE DEFAULT 0.0) AS SELECT flux FROM
            LOFARsrc[*][*][*][30][11][‘I’];
         CREATE ARRAY VIEW G (idx INT DIMENSION[0:1:gcnt], val DOUBLE DEFAULT 0.0) AS SELECT flux FROM
            LOFARsrc[*][*][*][200][11][‘I’];

         CREATE ARRAY CrCorr30_200 (idx INT DIMENSION[-fcnt+1:1:gcnt], val DOUBLE DEFAULT 0.0);
         INSERT INTO CrCorr SELECT SUM(F.flux * G.flux) FROM F, G, CrCorr30_200 AS C
           GROUP BY F[MAX(0, -C.idx) : MIN(fcnt, gcnt-C.idx)], G[MAX(0, C.idx) : MIN(gcnt, fcnt+C.idx)];



2011-11-09                                                 ADASS XXI                                                          19
LOFAR Use Case
                                                                                                           ra DOUBLE,
   zone                                                    frequency                                       decl DOUBLE,
                                                                                                           ra_err DOUBLE,
  90                                                                                                       decl_err DOUBLE,
                                                           ...                                             flux DOUBLE,
   ...                                                                                                     ...
                                                           ν4
   2
                                                           ν3
    1
                                                           ν2                                          V
    0                                                                                              U
                                                           ν1                                  Q
   -1                                                                                      I
   -2                                                            t1   t2   t3   t4   ...           time
   ...
  -90

            0   1   2   3   ...   357   358   359   meridian     dynamic grouping for every iteration

         DECLARE fcnt INT, gcnt INT;
         SET fcnt = SELECT COUNT(*) FROM LOFARsrc[*][*][*][30][11][‘I’];
         SET gcnt = SELECT COUNT(*) FROM LOFARsrc[*][*][*][200][11][‘I’];

         CREATE ARRAY VIEW F (idx INT DIMENSION[0:1:fcnt], flux DOUBLE DEFAULT 0.0) AS SELECT flux FROM
            LOFARsrc[*][*][*][30][11][‘I’];
         CREATE ARRAY VIEW G (idx INT DIMENSION[0:1:gcnt], val DOUBLE DEFAULT 0.0) AS SELECT flux FROM
            LOFARsrc[*][*][*][200][11][‘I’];

         CREATE ARRAY CrCorr30_200 (idx INT DIMENSION[-fcnt+1:1:gcnt], val DOUBLE DEFAULT 0.0);
         INSERT INTO CrCorr SELECT SUM(F.flux * G.flux) FROM F, G, CrCorr30_200 AS C
           GROUP BY F[MAX(0, -C.idx) : MIN(fcnt, gcnt-C.idx)], G[MAX(0, C.idx) : MIN(gcnt, fcnt+C.idx)];



2011-11-09                                                 ADASS XXI                                                          20
Conclusion

             SciQL: a novel query language for scientific data

                A symbiosis of relational and array paradigm

             Simplifies expression of complex scientific algorithms

             Leave optimisation to DBMS kernel

             Opens opportunities to enhance scientific data mining



             Under active implementation


                                                   !"#$%&'()*+,#-&$.#/(012#&+$#%3$%#,(

                   www.scilens.org          www.monetdb.org
                                                   2.#(4&#$5()*+,#-&$".1(6&$&



                                                   !"#$%&'()&"#*+,-(     ./0/123
                                                   4")*'()5"%%,%*'(*#-(( 6!7(8 9:7;;9




2011-11-09                             ADASS XXI                                                 21

More Related Content

More from PlanetData Network of Excellence

Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
PlanetData Network of Excellence
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
PlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
PlanetData Network of Excellence
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
PlanetData Network of Excellence
 

More from PlanetData Network of Excellence (20)

Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
Building a Front End for a Sensor Data Cloud
Building a Front End for a Sensor Data CloudBuilding a Front End for a Sensor Data Cloud
Building a Front End for a Sensor Data Cloud
 
OntoGen Extension for Exploring Image Collections
OntoGen Extension for Exploring Image CollectionsOntoGen Extension for Exploring Image Collections
OntoGen Extension for Exploring Image Collections
 
Exploring The Hubness-Related Properties of Oceanographic Sensor Data
Exploring The Hubness-Related Properties of Oceanographic Sensor DataExploring The Hubness-Related Properties of Oceanographic Sensor Data
Exploring The Hubness-Related Properties of Oceanographic Sensor Data
 
Exposing Real World Information for the Web of Things
Exposing Real World Information for the Web of ThingsExposing Real World Information for the Web of Things
Exposing Real World Information for the Web of Things
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Astronomical Data Processing Using SciQL, an SQL Based Query Language for Array Data

  • 1. Astronomical Data Processing Using SciQL, an SQL Based Query Language for Array Data Ying Zhang, Bart Scheers, Martin Kersten, Milena Ivanova, Niels Nes CWI Amsterdam ADASS XXI, Nov. 06-10, 2011, Paris, France !"#$%&'()*+,#-&$.#/(012#&+$#%3$%#,( 2.#(4&#$5()*+,#-&$".1(6&$& !"#$%&'()&"#*+,-( ./0/123 4")*'()5"%%,%*'(*#-(( 6!7(8 9:7;;9
  • 2.
  • 3. Why Not RDBMS? SQL is difficult No appropriate array denotations No functional complete operation set DBMSs are slow Too much overhead Size limitations (due to BLOB representations) Existing foreign files Scale ... 2011-11-09 ADASS XXI 3
  • 4. SciQL An array query language based on SQL:2003 To lower the entrance fee to RDBMSs Distinguish features (Kersten et al. AD ’11; Zhang et al. IDEAS2011): Arrays and tables as first class citizens of DBMSs Seamless integration of relational and array paradigms Named dimensions with constraints Flexible structure-based grouping LOFAR Transient Key Project use case 2011-11-09 ADASS XXI 4
  • 5. Array Definitions y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 x 0 1 2 3 null CREATE ARRAY A1 ( x INT DIMENSION [0:1:4], y INT DIMENSION [0:1:4], v FLOAT DEFAULT 0.0); 2011-11-09 ADASS XXI 5
  • 6. Array Definitions y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 x 0 1 2 3 null CREATE ARRAY A1 ( x INT DIMENSION [0:1:4], y INT DIMENSION [0:1:4], v FLOAT DEFAULT 0.0); dimensions, any scalar data type 2011-11-09 ADASS XXI 5
  • 7. Array Definitions y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 x 0 1 2 3 null CREATE ARRAY A1 ( x INT DIMENSION [0:1:4], y INT DIMENSION [0:1:4], v FLOAT DEFAULT 0.0); dimensional range: [(start|∗) : (step|∗) : (stop|∗)] 2011-11-09 ADASS XXI 6
  • 8. Array Definitions y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 x 0 1 2 3 null CREATE ARRAY A1 ( x INT DIMENSION [0:1:4], y INT DIMENSION [0:1:4], v FLOAT DEFAULT 0.0); cell values, any column data type 2011-11-09 ADASS XXI 7
  • 9. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-11-09 ADASS XXI 8
  • 10. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-11-09 ADASS XXI 8
  • 11. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-11-09 ADASS XXI 8
  • 12. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-11-09 ADASS XXI 8
  • 13. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-11-09 ADASS XXI 8
  • 14. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 Anchor point: 2 0.0 0.0 0.0 0.0 A1[x][y] null null 1 0.0 0.5 0.5 0.5 0 0.0 0.0 0.0 0.0 0 1 2 3 x null 2011-11-09 ADASS XXI 8
  • 15. Array Tiling SELECT [x], [y], AVG(v) FROM A1 GROUP BY A1[x:x+2][y:y+2]; y null 3 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 null null 1 0.125 0.25 0.25 0.25 0 0.125 0.25 0.25 0.25 0 1 2 3 x null 2011-11-09 ADASS XXI 9
  • 16. LOFAR Catalogue ra DOUBLE, zone (Gray et al. 2006) frequency decl DOUBLE, ra_err DOUBLE, 90 decl_err DOUBLE, ... flux DOUBLE, ... ... ν4 2 ν3 1 ν2 V 0 U ν1 Q -1 I -2 t1 t2 t3 t4 ... time ... -90 0 1 2 3 ... 357 358 359 meridian CREATE ARRAY LOFARsrc ( zone INT DIMENSION[-90:1:91], mrdn INT DIMENSION[0:1:360], ts TIMESTAMP DIMENSION, freq INT DIMENSION[30:10:241], id INT DIMENSION[0:1:*], stks CHAR(1) DIMENSION CHECK(stks=`I' OR stks=`Q' OR stks=`U' OR stks=`V'), ra DOUBLE, decl DOUBLE, ra_err DOUBLE, decl_err DOUBLE, flux DOUBLE, ...); 2011-11-09 ADASS XXI 10
  • 17. LOFAR Use Case ra DOUBLE, zone (Gray et al. 2006) frequency decl DOUBLE, ra_err DOUBLE, 90 decl_err DOUBLE, ... flux DOUBLE, ... ... ν4 2 ν3 1 ν2 V 0 U ν1 Q -1 I -2 t1 t2 t3 t4 ... time ... -90 0 1 2 3 ... 357 358 359 meridian Similarity of the flux of a LOFAR source at frequencies 30 MHz and 200 MHz cross-correlation of two time series 2011-11-09 ADASS XXI 11
  • 18. Cross-Correlation idx 0 1 2 3 F val 4 3 6 2 idx 0 1 2 G val 1 5 7 idx -3 -2 -1 0 1 2 Cr val 2011-11-09 ADASS XXI 12
  • 19. Cross-Correlation Cr.idx = -3 idx 0 1 2 3 F F [3 : 4] val 4 3 6 2 idx 0 1 2 G [0 : 1] G val 1 5 7 idx -3 -2 -1 0 1 2 Cr val 2 2011-11-09 ADASS XXI 13
  • 20. Cross-Correlation Cr.idx = -2 idx 0 1 2 3 F F [2 : 4] val 4 3 6 2 idx 0 1 2 G [0 : 2] G val 1 5 7 idx -3 -2 -1 0 1 2 Cr val 2 16 2011-11-09 ADASS XXI 14
  • 21. Cross-Correlation Cr.idx = -1 idx 0 1 2 3 F F [1 : 4] val 4 3 6 2 idx 0 1 2 G [0 : 3] G val 1 5 7 idx -3 -2 -1 0 1 2 Cr val 2 16 47 2011-11-09 ADASS XXI 15
  • 22. Cross-Correlation Cr.idx = 0 idx 0 1 2 3 F F [0 : 3] val 4 3 6 2 idx 0 1 2 G [0 : 3] G val 1 5 7 idx -3 -2 -1 0 1 2 Cr val 2 16 47 61 2011-11-09 ADASS XXI 16
  • 23. Cross-Correlation Cr.idx = 1 idx 0 1 2 3 F F [0 : 2] val 4 3 6 2 idx 0 1 2 G [1 : 3] G val 1 5 7 idx -3 -2 -1 0 1 2 Cr val 2 16 47 61 41 2011-11-09 ADASS XXI 17
  • 24. Cross-Correlation Cr.idx = 2 idx 0 1 2 3 F F [0 : 1] val 4 3 6 2 idx 0 1 2 G [2 : 3] G val 1 5 7 idx -3 -2 -1 0 1 2 Cr val 2 16 47 61 41 28 2011-11-09 ADASS XXI 18
  • 25. LOFAR Use Case ra DOUBLE, zone (Gray et al. 2006) frequency decl DOUBLE, ra_err DOUBLE, 90 decl_err DOUBLE, ... flux DOUBLE, ... ... ν4 2 ν3 1 ν2 V 0 U ν1 Q -1 I -2 t1 t2 t3 t4 ... time ... -90 0 1 2 3 ... 357 358 359 meridian DECLARE fcnt INT, gcnt INT; SET fcnt = SELECT COUNT(*) FROM LOFARsrc[*][*][*][30][11][‘I’]; SET gcnt = SELECT COUNT(*) FROM LOFARsrc[*][*][*][200][11][‘I’]; CREATE ARRAY VIEW F (idx INT DIMENSION[0:1:fcnt], flux DOUBLE DEFAULT 0.0) AS SELECT flux FROM LOFARsrc[*][*][*][30][11][‘I’]; CREATE ARRAY VIEW G (idx INT DIMENSION[0:1:gcnt], val DOUBLE DEFAULT 0.0) AS SELECT flux FROM LOFARsrc[*][*][*][200][11][‘I’]; CREATE ARRAY CrCorr30_200 (idx INT DIMENSION[-fcnt+1:1:gcnt], val DOUBLE DEFAULT 0.0); INSERT INTO CrCorr SELECT SUM(F.flux * G.flux) FROM F, G, CrCorr30_200 AS C GROUP BY F[MAX(0, -C.idx) : MIN(fcnt, gcnt-C.idx)], G[MAX(0, C.idx) : MIN(gcnt, fcnt+C.idx)]; 2011-11-09 ADASS XXI 19
  • 26. LOFAR Use Case ra DOUBLE, zone (Gray et al. 2006) frequency decl DOUBLE, ra_err DOUBLE, 90 decl_err DOUBLE, ... flux DOUBLE, ... ... ν4 2 ν3 1 ν2 V 0 U ν1 Q -1 I -2 t1 t2 t3 t4 ... time ... -90 0 1 2 3 ... 357 358 359 meridian retrieve the time series DECLARE fcnt INT, gcnt INT; SET fcnt = SELECT COUNT(*) FROM LOFARsrc[*][*][*][30][11][‘I’]; SET gcnt = SELECT COUNT(*) FROM LOFARsrc[*][*][*][200][11][‘I’]; CREATE ARRAY VIEW F (idx INT DIMENSION[0:1:fcnt], flux DOUBLE DEFAULT 0.0) AS SELECT flux FROM LOFARsrc[*][*][*][30][11][‘I’]; CREATE ARRAY VIEW G (idx INT DIMENSION[0:1:gcnt], val DOUBLE DEFAULT 0.0) AS SELECT flux FROM LOFARsrc[*][*][*][200][11][‘I’]; CREATE ARRAY CrCorr30_200 (idx INT DIMENSION[-fcnt+1:1:gcnt], val DOUBLE DEFAULT 0.0); INSERT INTO CrCorr SELECT SUM(F.flux * G.flux) FROM F, G, CrCorr30_200 AS C GROUP BY F[MAX(0, -C.idx) : MIN(fcnt, gcnt-C.idx)], G[MAX(0, C.idx) : MIN(gcnt, fcnt+C.idx)]; 2011-11-09 ADASS XXI 19
  • 27. LOFAR Use Case ra DOUBLE, zone frequency decl DOUBLE, ra_err DOUBLE, 90 decl_err DOUBLE, ... flux DOUBLE, ... ... ν4 2 ν3 1 ν2 V 0 U ν1 Q -1 I -2 t1 t2 t3 t4 ... time ... -90 0 1 2 3 ... 357 358 359 meridian dynamic grouping for every iteration DECLARE fcnt INT, gcnt INT; SET fcnt = SELECT COUNT(*) FROM LOFARsrc[*][*][*][30][11][‘I’]; SET gcnt = SELECT COUNT(*) FROM LOFARsrc[*][*][*][200][11][‘I’]; CREATE ARRAY VIEW F (idx INT DIMENSION[0:1:fcnt], flux DOUBLE DEFAULT 0.0) AS SELECT flux FROM LOFARsrc[*][*][*][30][11][‘I’]; CREATE ARRAY VIEW G (idx INT DIMENSION[0:1:gcnt], val DOUBLE DEFAULT 0.0) AS SELECT flux FROM LOFARsrc[*][*][*][200][11][‘I’]; CREATE ARRAY CrCorr30_200 (idx INT DIMENSION[-fcnt+1:1:gcnt], val DOUBLE DEFAULT 0.0); INSERT INTO CrCorr SELECT SUM(F.flux * G.flux) FROM F, G, CrCorr30_200 AS C GROUP BY F[MAX(0, -C.idx) : MIN(fcnt, gcnt-C.idx)], G[MAX(0, C.idx) : MIN(gcnt, fcnt+C.idx)]; 2011-11-09 ADASS XXI 20
  • 28. Conclusion SciQL: a novel query language for scientific data A symbiosis of relational and array paradigm Simplifies expression of complex scientific algorithms Leave optimisation to DBMS kernel Opens opportunities to enhance scientific data mining Under active implementation !"#$%&'()*+,#-&$.#/(012#&+$#%3$%#,( www.scilens.org www.monetdb.org 2.#(4&#$5()*+,#-&$".1(6&$& !"#$%&'()&"#*+,-( ./0/123 4")*'()5"%%,%*'(*#-(( 6!7(8 9:7;;9 2011-11-09 ADASS XXI 21