SlideShare une entreprise Scribd logo
1  sur  35
Iván de Prado Alonso – CEO of Datasalt
www.datasalt.es
@ivanprado
@datasalt




                       Splout SQL
       When Big Data Output is also Big
                    Data
Full SQL*                 Unlike NoSQL

For Big Data              Unlike RDBMS

Web latency &             Unlike Impala,
throughput                Apache Drill, etc.

* Within each partition
How does it work?




  Isolation between generation and serving
Generate tablespace CLIENTS_INFO with
Generation                  table CLIENTS partitioned by CID
                            table SALES    partitioned by CID
   Table CLIENTS                           Tablespace CLIENTS_INFO
  CID      Name                Partition U10 – U35
  U20      Doug                   Table CLIENTS             Table SALES
  U21      Ted                  CID      Name        SID     CID     Amount
  U40      John                 U20      Doug        S100    U20     102
                                U21      Ted         S101    U20     60

         Table SALES           Partition U36 – U60
  SID     CID      Amount
                                  Table CLIENTS             Table SALES
  S100    U20      102
                                CID      Name        SID     CID     Amount
  S101    U20      60
                                U40      John        S223    U40     99
  S223    U40      99
For key = ‘U20’, tablespace=‘CLIENTS_INFO’
                   SELECT Name, sum(Amount) FROM
Serving            CLIENTS c, SALES s WHERE
                   c.CID = s.CID AND CID = ‘U20’;


   Partition U10 – U35                Partition U36 – U60
          Table CLIENTS                      Table CLIENTS
      CID         Name                   CID          Name
      U20         Doug                   U40          John
      U21         Ted

           Table SALES                         Table SALES
    SID     CID     Amount             SID      CID      Amount
    S100    U20     102                S223     U40      99
    S101    U20     60
For key = ‘U40’, tablespace=‘CLIENTS_INFO’
                   SELECT Name, sum(Amount) FROM
Serving            CLIENTS c, SALES s WHERE
                   c.CID = s.CID AND CID = ‘U40’;


   Partition U10 – U35                Partition U36 – U60
          Table CLIENTS                      Table CLIENTS
      CID         Name                   CID          Name
      U20         Doug                   U40          John
      U21         Ted

           Table SALES                         Table SALES
    SID     CID     Amount             SID      CID      Amount
    S100    U20     102                S223     U40      99
    S101    U20     60
Why does it scale?
   Data is partitioned

   Partitions are distributed across nodes

   Adding more nodes increases capacity

   Queries restricted to a single partition

   Generation does not impact serving
Ok, so what is
 Splout SQL
 useful for?
Big Data
Analytics




   Manageable output
Big Data
                   Analytics




Sometimes Big Data output is also Big Data
Splout SQL allows
     to serve
 Big Data results
Let’s see an example …
Building a Google Analytics
Imagine that one crazy day you decide to build
some kind of Google Analytics…

       Zillions of events
       Millions of domains
       Individual panel per domain
Requirements
 Time-based charts (day/hour aggregations)




 Flexible dimension breakdown
    Per page, per browser
    Per country, per language
    …
With Splout SQL
Splout SQL provides
 SQL consolidated
 views for Hadoop
        data
Let’s see more
 details about
  Splout SQL
Splout SQL Architecture
Each partition is …
      Backed by SQLite

      Generated on Hadoop
        Including any indexes needed
        Data can be sorted before insertion to
        minimize disk seeks at query time
        Pre-sampling for balancing partition size
      Distributed on Splout SQL cluster
        With replication for failover
Atomicity
   A tablespace is a set of tables that
   share the same partitioning schema

   Tablespaces are versioned
        Only one version served at a time

   Several tablespaces can be deployed
   at once
        All-or-nothing semantics (atomicity)
        Rollback support
Characteristics
    Ensured ms latencies
     Even when queries hit disk

     Controlled by the developer selecting the
     proper:
        -   Cluster topology
        -   Partitioning
        -   Indexes
        -   Data collocation (insertion order)
Characteristics (II)
    100% SQL
      But restricted to a single partition
      Real-time aggregations
       Joins

     Scalability
      In data capacity
      In performance
Characteristics (III)
    Atomicity
       New data replaces old data all at once

     High availability
       Through the use of replication

    Open Source
Characteristics (IV)
    Easy to manage
      Changing the size of the cluster can be done
      without any downtime

    Read only
      Data is updated in batches
      Updates come from new tablespace
      deployments
Characteristics (V)
    Native connectors
      Hive
      Pig
      Cascading
API - Generation
    Command line
     Loading CSV files
      $ hadoop jar splout-*-hadoop.jar generate …


    Java API



    Connectors
API - Service
    Rest API



                JSON response
API - Console
Benchmark
   350 GB Wikipedia logs
   Aggregation queries impacting 15 rows in
   average
   2-machines cluster
    900 queries/second, 80 ms/query, 80 threads
Benchmark (II)
   4-machines cluster
     3150 queries/second, 40 ms/query, 160 threads




 More info:
    http://sploutsql.com/performance.html
Web latency

       SQL

       Consolidated Views

       For Hadoop
“A good candidate for the serving layer of a lambda architecture”
www.SploutCloud.com - Splout SQL as a service
Future work
   Growing the community
     Do you want to collaborate? 

   Automatic rebalancing on failover
     Almost done

   Some read/write capabilities
     Enabling Splout SQL to become the speed
     layer on lambda architectures
Iván de Prado Alonso – CEO of Datasalt
www.datasalt.es
@ivanprado
@datasalt




        Questions?

Contenu connexe

En vedette

IPad HD Games
IPad HD GamesIPad HD Games
IPad HD GamesMihex
 
Праздники сел и деревень 2010 год
Праздники сел и деревень 2010 годПраздники сел и деревень 2010 год
Праздники сел и деревень 2010 год35nw
 
Contenidos inapropiados y faltos de rigor en internet
Contenidos inapropiados y faltos de rigor en internetContenidos inapropiados y faltos de rigor en internet
Contenidos inapropiados y faltos de rigor en internetNMMP
 
«Река долголетия»
«Река долголетия»«Река долголетия»
«Река долголетия»35nw
 
Day snowman
Day snowmanDay snowman
Day snowmanafresh65
 
Comportamiento del consumidor valeria ruvalcaba
Comportamiento del consumidor valeria ruvalcabaComportamiento del consumidor valeria ruvalcaba
Comportamiento del consumidor valeria ruvalcabaValeria Ruvalcaba
 
Sistem pemerintahan
Sistem pemerintahanSistem pemerintahan
Sistem pemerintahanskaw87
 
Las redes sociales en la universidad twitter
Las redes sociales en la universidad twitterLas redes sociales en la universidad twitter
Las redes sociales en la universidad twitterDaniel Rodrigo
 
Space Efficient Kitchens
Space Efficient Kitchens Space Efficient Kitchens
Space Efficient Kitchens Mihex
 
Geometria analitica um tratamento vetorial. (boulos)
Geometria analitica   um tratamento vetorial. (boulos)Geometria analitica   um tratamento vetorial. (boulos)
Geometria analitica um tratamento vetorial. (boulos)Tiarles Guterres
 
Designing the iCloud logo
Designing the iCloud logoDesigning the iCloud logo
Designing the iCloud logoMihex
 
473 tactical cellular jamming system
473 tactical cellular jamming system473 tactical cellular jamming system
473 tactical cellular jamming systemriskis
 
Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?
Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?
Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?nacionalaidentitate
 

En vedette (20)

IPad HD Games
IPad HD GamesIPad HD Games
IPad HD Games
 
Праздники сел и деревень 2010 год
Праздники сел и деревень 2010 годПраздники сел и деревень 2010 год
Праздники сел и деревень 2010 год
 
ร้อยกรอง
ร้อยกรองร้อยกรอง
ร้อยกรอง
 
Contenidos inapropiados y faltos de rigor en internet
Contenidos inapropiados y faltos de rigor en internetContenidos inapropiados y faltos de rigor en internet
Contenidos inapropiados y faltos de rigor en internet
 
«Река долголетия»
«Река долголетия»«Река долголетия»
«Река долголетия»
 
Day snowman
Day snowmanDay snowman
Day snowman
 
Comportamiento del consumidor valeria ruvalcaba
Comportamiento del consumidor valeria ruvalcabaComportamiento del consumidor valeria ruvalcaba
Comportamiento del consumidor valeria ruvalcaba
 
Sistem pemerintahan
Sistem pemerintahanSistem pemerintahan
Sistem pemerintahan
 
Session2 dilshan tirimanna fdn group
Session2 dilshan tirimanna fdn groupSession2 dilshan tirimanna fdn group
Session2 dilshan tirimanna fdn group
 
Las redes sociales en la universidad twitter
Las redes sociales en la universidad twitterLas redes sociales en la universidad twitter
Las redes sociales en la universidad twitter
 
Anemia
AnemiaAnemia
Anemia
 
Space Efficient Kitchens
Space Efficient Kitchens Space Efficient Kitchens
Space Efficient Kitchens
 
Geometria analitica um tratamento vetorial. (boulos)
Geometria analitica   um tratamento vetorial. (boulos)Geometria analitica   um tratamento vetorial. (boulos)
Geometria analitica um tratamento vetorial. (boulos)
 
Pump and type
Pump and type Pump and type
Pump and type
 
Designing the iCloud logo
Designing the iCloud logoDesigning the iCloud logo
Designing the iCloud logo
 
Iat
IatIat
Iat
 
Beyond the mobile web
Beyond the mobile webBeyond the mobile web
Beyond the mobile web
 
473 tactical cellular jamming system
473 tactical cellular jamming system473 tactical cellular jamming system
473 tactical cellular jamming system
 
Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?
Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?
Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?
 
Parasites in Your DNA
Parasites in Your DNAParasites in Your DNA
Parasites in Your DNA
 

Similaire à Splout SQL - Web latency SQL views for Hadoop

Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
ONS content extraction
ONS content extractionONS content extraction
ONS content extractionKellyCheah
 
Snowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSnowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSenturus
 
A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...
A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...
A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...HostedbyConfluent
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayAjay Shriwastava
 
Tabular Data Stream: The Binding Between Client and SAP ASE
Tabular Data Stream: The Binding Between Client and SAP ASETabular Data Stream: The Binding Between Client and SAP ASE
Tabular Data Stream: The Binding Between Client and SAP ASESAP Technology
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisRed Gate Software
 
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseBig Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseDean Hallman
 
PNWPHP -- What are Databases so &#%-ing Difficult
PNWPHP -- What are Databases so &#%-ing DifficultPNWPHP -- What are Databases so &#%-ing Difficult
PNWPHP -- What are Databases so &#%-ing DifficultDave Stokes
 
Cisco Switches vs. Huawei Switches
Cisco Switches vs. Huawei SwitchesCisco Switches vs. Huawei Switches
Cisco Switches vs. Huawei Switches美兰 曾
 
Directions EMEA Choosing the best possible Azure platform for NAV
Directions EMEA Choosing the best possible Azure platform for NAVDirections EMEA Choosing the best possible Azure platform for NAV
Directions EMEA Choosing the best possible Azure platform for NAVAleksandar Totovic
 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications Keshav Murthy
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...Amazon Web Services Korea
 
USQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeUSQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeTrivadis
 
Relocation
RelocationRelocation
Relocationbkelley1
 
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsScyllaDB
 

Similaire à Splout SQL - Web latency SQL views for Hadoop (20)

Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
Sap edi idoc
Sap edi idocSap edi idoc
Sap edi idoc
 
ONS content extraction
ONS content extractionONS content extraction
ONS content extraction
 
Snowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSnowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern Analytics
 
A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...
A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...
A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
 
Tabular Data Stream: The Binding Between Client and SAP ASE
Tabular Data Stream: The Binding Between Client and SAP ASETabular Data Stream: The Binding Between Client and SAP ASE
Tabular Data Stream: The Binding Between Client and SAP ASE
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony Davis
 
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseBig Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
 
PNWPHP -- What are Databases so &#%-ing Difficult
PNWPHP -- What are Databases so &#%-ing DifficultPNWPHP -- What are Databases so &#%-ing Difficult
PNWPHP -- What are Databases so &#%-ing Difficult
 
Optimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec AzureOptimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec Azure
 
Cisco Switches vs. Huawei Switches
Cisco Switches vs. Huawei SwitchesCisco Switches vs. Huawei Switches
Cisco Switches vs. Huawei Switches
 
Mysql rab2-student
Mysql rab2-studentMysql rab2-student
Mysql rab2-student
 
Mysql rab2-student
Mysql rab2-studentMysql rab2-student
Mysql rab2-student
 
Directions EMEA Choosing the best possible Azure platform for NAV
Directions EMEA Choosing the best possible Azure platform for NAVDirections EMEA Choosing the best possible Azure platform for NAV
Directions EMEA Choosing the best possible Azure platform for NAV
 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
 
USQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeUSQ Landdemos Azure Data Lake
USQ Landdemos Azure Data Lake
 
Relocation
RelocationRelocation
Relocation
 
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful Migrations
 

Dernier

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Dernier (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Splout SQL - Web latency SQL views for Hadoop

  • 1. Iván de Prado Alonso – CEO of Datasalt www.datasalt.es @ivanprado @datasalt Splout SQL When Big Data Output is also Big Data
  • 2.
  • 3. Full SQL* Unlike NoSQL For Big Data Unlike RDBMS Web latency & Unlike Impala, throughput Apache Drill, etc. * Within each partition
  • 4. How does it work? Isolation between generation and serving
  • 5. Generate tablespace CLIENTS_INFO with Generation table CLIENTS partitioned by CID table SALES partitioned by CID Table CLIENTS Tablespace CLIENTS_INFO CID Name Partition U10 – U35 U20 Doug Table CLIENTS Table SALES U21 Ted CID Name SID CID Amount U40 John U20 Doug S100 U20 102 U21 Ted S101 U20 60 Table SALES Partition U36 – U60 SID CID Amount Table CLIENTS Table SALES S100 U20 102 CID Name SID CID Amount S101 U20 60 U40 John S223 U40 99 S223 U40 99
  • 6. For key = ‘U20’, tablespace=‘CLIENTS_INFO’ SELECT Name, sum(Amount) FROM Serving CLIENTS c, SALES s WHERE c.CID = s.CID AND CID = ‘U20’; Partition U10 – U35 Partition U36 – U60 Table CLIENTS Table CLIENTS CID Name CID Name U20 Doug U40 John U21 Ted Table SALES Table SALES SID CID Amount SID CID Amount S100 U20 102 S223 U40 99 S101 U20 60
  • 7. For key = ‘U40’, tablespace=‘CLIENTS_INFO’ SELECT Name, sum(Amount) FROM Serving CLIENTS c, SALES s WHERE c.CID = s.CID AND CID = ‘U40’; Partition U10 – U35 Partition U36 – U60 Table CLIENTS Table CLIENTS CID Name CID Name U20 Doug U40 John U21 Ted Table SALES Table SALES SID CID Amount SID CID Amount S100 U20 102 S223 U40 99 S101 U20 60
  • 8. Why does it scale? Data is partitioned Partitions are distributed across nodes Adding more nodes increases capacity Queries restricted to a single partition Generation does not impact serving
  • 9. Ok, so what is Splout SQL useful for?
  • 10. Big Data Analytics Manageable output
  • 11. Big Data Analytics Sometimes Big Data output is also Big Data
  • 12. Splout SQL allows to serve Big Data results
  • 13. Let’s see an example …
  • 14. Building a Google Analytics Imagine that one crazy day you decide to build some kind of Google Analytics… Zillions of events Millions of domains Individual panel per domain
  • 15. Requirements Time-based charts (day/hour aggregations) Flexible dimension breakdown Per page, per browser Per country, per language …
  • 17. Splout SQL provides SQL consolidated views for Hadoop data
  • 18. Let’s see more details about Splout SQL
  • 20. Each partition is … Backed by SQLite Generated on Hadoop Including any indexes needed Data can be sorted before insertion to minimize disk seeks at query time Pre-sampling for balancing partition size Distributed on Splout SQL cluster With replication for failover
  • 21. Atomicity A tablespace is a set of tables that share the same partitioning schema Tablespaces are versioned Only one version served at a time Several tablespaces can be deployed at once All-or-nothing semantics (atomicity) Rollback support
  • 22. Characteristics Ensured ms latencies Even when queries hit disk Controlled by the developer selecting the proper: - Cluster topology - Partitioning - Indexes - Data collocation (insertion order)
  • 23. Characteristics (II) 100% SQL But restricted to a single partition Real-time aggregations Joins Scalability In data capacity In performance
  • 24. Characteristics (III) Atomicity New data replaces old data all at once High availability Through the use of replication Open Source
  • 25. Characteristics (IV) Easy to manage Changing the size of the cluster can be done without any downtime Read only Data is updated in batches Updates come from new tablespace deployments
  • 26. Characteristics (V) Native connectors Hive Pig Cascading
  • 27. API - Generation Command line Loading CSV files $ hadoop jar splout-*-hadoop.jar generate … Java API Connectors
  • 28. API - Service Rest API JSON response
  • 30. Benchmark 350 GB Wikipedia logs Aggregation queries impacting 15 rows in average 2-machines cluster 900 queries/second, 80 ms/query, 80 threads
  • 31. Benchmark (II) 4-machines cluster 3150 queries/second, 40 ms/query, 160 threads More info: http://sploutsql.com/performance.html
  • 32. Web latency SQL Consolidated Views For Hadoop “A good candidate for the serving layer of a lambda architecture”
  • 33. www.SploutCloud.com - Splout SQL as a service
  • 34. Future work Growing the community Do you want to collaborate?  Automatic rebalancing on failover Almost done Some read/write capabilities Enabling Splout SQL to become the speed layer on lambda architectures
  • 35. Iván de Prado Alonso – CEO of Datasalt www.datasalt.es @ivanprado @datasalt Questions?