SlideShare une entreprise Scribd logo
1  sur  35
Iván de Prado Alonso – CEO of Datasalt
www.datasalt.es
@ivanprado
@datasalt




                       Splout SQL
       When Big Data Output is also Big
                    Data
Full SQL*                 Unlike NoSQL

For Big Data              Unlike RDBMS

Web latency &             Unlike Impala,
throughput                Apache Drill, etc.

* Within each partition
How does it work?




  Isolation between generation and serving
Generate tablespace CLIENTS_INFO with
Generation                  table CLIENTS partitioned by CID
                            table SALES    partitioned by CID
   Table CLIENTS                           Tablespace CLIENTS_INFO
  CID      Name                Partition U10 – U35
  U20      Doug                   Table CLIENTS             Table SALES
  U21      Ted                  CID      Name        SID     CID     Amount
  U40      John                 U20      Doug        S100    U20     102
                                U21      Ted         S101    U20     60

         Table SALES           Partition U36 – U60
  SID     CID      Amount
                                  Table CLIENTS             Table SALES
  S100    U20      102
                                CID      Name        SID     CID     Amount
  S101    U20      60
                                U40      John        S223    U40     99
  S223    U40      99
For key = ‘U20’, tablespace=‘CLIENTS_INFO’
                   SELECT Name, sum(Amount) FROM
Serving            CLIENTS c, SALES s WHERE
                   c.CID = s.CID AND CID = ‘U20’;


   Partition U10 – U35                Partition U36 – U60
          Table CLIENTS                      Table CLIENTS
      CID         Name                   CID          Name
      U20         Doug                   U40          John
      U21         Ted

           Table SALES                         Table SALES
    SID     CID     Amount             SID      CID      Amount
    S100    U20     102                S223     U40      99
    S101    U20     60
For key = ‘U40’, tablespace=‘CLIENTS_INFO’
                   SELECT Name, sum(Amount) FROM
Serving            CLIENTS c, SALES s WHERE
                   c.CID = s.CID AND CID = ‘U40’;


   Partition U10 – U35                Partition U36 – U60
          Table CLIENTS                      Table CLIENTS
      CID         Name                   CID          Name
      U20         Doug                   U40          John
      U21         Ted

           Table SALES                         Table SALES
    SID     CID     Amount             SID      CID      Amount
    S100    U20     102                S223     U40      99
    S101    U20     60
Why does it scale?
   Data is partitioned

   Partitions are distributed across nodes

   Adding more nodes increases capacity

   Queries restricted to a single partition

   Generation does not impact serving
Ok, so what is
 Splout SQL
 useful for?
Big Data
Analytics




   Manageable output
Big Data
                   Analytics




Sometimes Big Data output is also Big Data
Splout SQL allows
     to serve
 Big Data results
Let’s see an example …
Building a Google Analytics
Imagine that one crazy day you decide to build
some kind of Google Analytics…

       Zillions of events
       Millions of domains
       Individual panel per domain
Requirements
 Time-based charts (day/hour aggregations)




 Flexible dimension breakdown
    Per page, per browser
    Per country, per language
    …
With Splout SQL
Splout SQL provides
 SQL consolidated
 views for Hadoop
        data
Let’s see more
 details about
  Splout SQL
Splout SQL Architecture
Each partition is …
      Backed by SQLite

      Generated on Hadoop
        Including any indexes needed
        Data can be sorted before insertion to
        minimize disk seeks at query time
        Pre-sampling for balancing partition size
      Distributed on Splout SQL cluster
        With replication for failover
Atomicity
   A tablespace is a set of tables that
   share the same partitioning schema

   Tablespaces are versioned
        Only one version served at a time

   Several tablespaces can be deployed
   at once
        All-or-nothing semantics (atomicity)
        Rollback support
Characteristics
    Ensured ms latencies
     Even when queries hit disk

     Controlled by the developer selecting the
     proper:
        -   Cluster topology
        -   Partitioning
        -   Indexes
        -   Data collocation (insertion order)
Characteristics (II)
    100% SQL
      But restricted to a single partition
      Real-time aggregations
       Joins

     Scalability
      In data capacity
      In performance
Characteristics (III)
    Atomicity
       New data replaces old data all at once

     High availability
       Through the use of replication

    Open Source
Characteristics (IV)
    Easy to manage
      Changing the size of the cluster can be done
      without any downtime

    Read only
      Data is updated in batches
      Updates come from new tablespace
      deployments
Characteristics (V)
    Native connectors
      Hive
      Pig
      Cascading
API - Generation
    Command line
     Loading CSV files
      $ hadoop jar splout-*-hadoop.jar generate …


    Java API



    Connectors
API - Service
    Rest API



                JSON response
API - Console
Benchmark
   350 GB Wikipedia logs
   Aggregation queries impacting 15 rows in
   average
   2-machines cluster
    900 queries/second, 80 ms/query, 80 threads
Benchmark (II)
   4-machines cluster
     3150 queries/second, 40 ms/query, 160 threads




 More info:
    http://sploutsql.com/performance.html
Web latency

       SQL

       Consolidated Views

       For Hadoop
“A good candidate for the serving layer of a lambda architecture”
www.SploutCloud.com - Splout SQL as a service
Future work
   Growing the community
     Do you want to collaborate? 

   Automatic rebalancing on failover
     Almost done

   Some read/write capabilities
     Enabling Splout SQL to become the speed
     layer on lambda architectures
Iván de Prado Alonso – CEO of Datasalt
www.datasalt.es
@ivanprado
@datasalt




        Questions?

Contenu connexe

En vedette

IPad HD Games
IPad HD GamesIPad HD Games
IPad HD GamesMihex
 
Праздники сел и деревень 2010 год
Праздники сел и деревень 2010 годПраздники сел и деревень 2010 год
Праздники сел и деревень 2010 год35nw
 
Contenidos inapropiados y faltos de rigor en internet
Contenidos inapropiados y faltos de rigor en internetContenidos inapropiados y faltos de rigor en internet
Contenidos inapropiados y faltos de rigor en internetNMMP
 
«Река долголетия»
«Река долголетия»«Река долголетия»
«Река долголетия»35nw
 
Day snowman
Day snowmanDay snowman
Day snowmanafresh65
 
Comportamiento del consumidor valeria ruvalcaba
Comportamiento del consumidor valeria ruvalcabaComportamiento del consumidor valeria ruvalcaba
Comportamiento del consumidor valeria ruvalcabaValeria Ruvalcaba
 
Sistem pemerintahan
Sistem pemerintahanSistem pemerintahan
Sistem pemerintahanskaw87
 
Las redes sociales en la universidad twitter
Las redes sociales en la universidad twitterLas redes sociales en la universidad twitter
Las redes sociales en la universidad twitterDaniel Rodrigo
 
Space Efficient Kitchens
Space Efficient Kitchens Space Efficient Kitchens
Space Efficient Kitchens Mihex
 
Geometria analitica um tratamento vetorial. (boulos)
Geometria analitica   um tratamento vetorial. (boulos)Geometria analitica   um tratamento vetorial. (boulos)
Geometria analitica um tratamento vetorial. (boulos)Tiarles Guterres
 
Designing the iCloud logo
Designing the iCloud logoDesigning the iCloud logo
Designing the iCloud logoMihex
 
473 tactical cellular jamming system
473 tactical cellular jamming system473 tactical cellular jamming system
473 tactical cellular jamming systemriskis
 
Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?
Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?
Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?nacionalaidentitate
 

En vedette (20)

IPad HD Games
IPad HD GamesIPad HD Games
IPad HD Games
 
Праздники сел и деревень 2010 год
Праздники сел и деревень 2010 годПраздники сел и деревень 2010 год
Праздники сел и деревень 2010 год
 
ร้อยกรอง
ร้อยกรองร้อยกรอง
ร้อยกรอง
 
Contenidos inapropiados y faltos de rigor en internet
Contenidos inapropiados y faltos de rigor en internetContenidos inapropiados y faltos de rigor en internet
Contenidos inapropiados y faltos de rigor en internet
 
«Река долголетия»
«Река долголетия»«Река долголетия»
«Река долголетия»
 
Day snowman
Day snowmanDay snowman
Day snowman
 
Comportamiento del consumidor valeria ruvalcaba
Comportamiento del consumidor valeria ruvalcabaComportamiento del consumidor valeria ruvalcaba
Comportamiento del consumidor valeria ruvalcaba
 
Sistem pemerintahan
Sistem pemerintahanSistem pemerintahan
Sistem pemerintahan
 
Session2 dilshan tirimanna fdn group
Session2 dilshan tirimanna fdn groupSession2 dilshan tirimanna fdn group
Session2 dilshan tirimanna fdn group
 
Las redes sociales en la universidad twitter
Las redes sociales en la universidad twitterLas redes sociales en la universidad twitter
Las redes sociales en la universidad twitter
 
Anemia
AnemiaAnemia
Anemia
 
Space Efficient Kitchens
Space Efficient Kitchens Space Efficient Kitchens
Space Efficient Kitchens
 
Geometria analitica um tratamento vetorial. (boulos)
Geometria analitica   um tratamento vetorial. (boulos)Geometria analitica   um tratamento vetorial. (boulos)
Geometria analitica um tratamento vetorial. (boulos)
 
Pump and type
Pump and type Pump and type
Pump and type
 
Designing the iCloud logo
Designing the iCloud logoDesigning the iCloud logo
Designing the iCloud logo
 
Iat
IatIat
Iat
 
Beyond the mobile web
Beyond the mobile webBeyond the mobile web
Beyond the mobile web
 
473 tactical cellular jamming system
473 tactical cellular jamming system473 tactical cellular jamming system
473 tactical cellular jamming system
 
Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?
Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?
Vietas piesaiste un mobilitāte: brīva izvēle vai nepieciešamība?
 
Parasites in Your DNA
Parasites in Your DNAParasites in Your DNA
Parasites in Your DNA
 

Similaire à Splout SQL - Web latency SQL views for Hadoop

Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
ONS content extraction
ONS content extractionONS content extraction
ONS content extractionKellyCheah
 
Snowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSnowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSenturus
 
A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...
A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...
A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...HostedbyConfluent
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayAjay Shriwastava
 
Tabular Data Stream: The Binding Between Client and SAP ASE
Tabular Data Stream: The Binding Between Client and SAP ASETabular Data Stream: The Binding Between Client and SAP ASE
Tabular Data Stream: The Binding Between Client and SAP ASESAP Technology
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisRed Gate Software
 
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseBig Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseDean Hallman
 
PNWPHP -- What are Databases so &#%-ing Difficult
PNWPHP -- What are Databases so &#%-ing DifficultPNWPHP -- What are Databases so &#%-ing Difficult
PNWPHP -- What are Databases so &#%-ing DifficultDave Stokes
 
Cisco Switches vs. Huawei Switches
Cisco Switches vs. Huawei SwitchesCisco Switches vs. Huawei Switches
Cisco Switches vs. Huawei Switches美兰 曾
 
Directions EMEA Choosing the best possible Azure platform for NAV
Directions EMEA Choosing the best possible Azure platform for NAVDirections EMEA Choosing the best possible Azure platform for NAV
Directions EMEA Choosing the best possible Azure platform for NAVAleksandar Totovic
 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications Keshav Murthy
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...Amazon Web Services Korea
 
USQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeUSQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeTrivadis
 
Relocation
RelocationRelocation
Relocationbkelley1
 
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsScyllaDB
 

Similaire à Splout SQL - Web latency SQL views for Hadoop (20)

Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
Sap edi idoc
Sap edi idocSap edi idoc
Sap edi idoc
 
ONS content extraction
ONS content extractionONS content extraction
ONS content extraction
 
Snowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSnowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern Analytics
 
A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...
A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...
A Glide, Skip or a Jump: Efficiently Stream Data into Your Medallion Architec...
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
 
Tabular Data Stream: The Binding Between Client and SAP ASE
Tabular Data Stream: The Binding Between Client and SAP ASETabular Data Stream: The Binding Between Client and SAP ASE
Tabular Data Stream: The Binding Between Client and SAP ASE
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony Davis
 
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseBig Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
 
PNWPHP -- What are Databases so &#%-ing Difficult
PNWPHP -- What are Databases so &#%-ing DifficultPNWPHP -- What are Databases so &#%-ing Difficult
PNWPHP -- What are Databases so &#%-ing Difficult
 
Optimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec AzureOptimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec Azure
 
Cisco Switches vs. Huawei Switches
Cisco Switches vs. Huawei SwitchesCisco Switches vs. Huawei Switches
Cisco Switches vs. Huawei Switches
 
Mysql rab2-student
Mysql rab2-studentMysql rab2-student
Mysql rab2-student
 
Mysql rab2-student
Mysql rab2-studentMysql rab2-student
Mysql rab2-student
 
Directions EMEA Choosing the best possible Azure platform for NAV
Directions EMEA Choosing the best possible Azure platform for NAVDirections EMEA Choosing the best possible Azure platform for NAV
Directions EMEA Choosing the best possible Azure platform for NAV
 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
 
USQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeUSQ Landdemos Azure Data Lake
USQ Landdemos Azure Data Lake
 
Relocation
RelocationRelocation
Relocation
 
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful Migrations
 

Dernier

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Dernier (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Splout SQL - Web latency SQL views for Hadoop

  • 1. Iván de Prado Alonso – CEO of Datasalt www.datasalt.es @ivanprado @datasalt Splout SQL When Big Data Output is also Big Data
  • 2.
  • 3. Full SQL* Unlike NoSQL For Big Data Unlike RDBMS Web latency & Unlike Impala, throughput Apache Drill, etc. * Within each partition
  • 4. How does it work? Isolation between generation and serving
  • 5. Generate tablespace CLIENTS_INFO with Generation table CLIENTS partitioned by CID table SALES partitioned by CID Table CLIENTS Tablespace CLIENTS_INFO CID Name Partition U10 – U35 U20 Doug Table CLIENTS Table SALES U21 Ted CID Name SID CID Amount U40 John U20 Doug S100 U20 102 U21 Ted S101 U20 60 Table SALES Partition U36 – U60 SID CID Amount Table CLIENTS Table SALES S100 U20 102 CID Name SID CID Amount S101 U20 60 U40 John S223 U40 99 S223 U40 99
  • 6. For key = ‘U20’, tablespace=‘CLIENTS_INFO’ SELECT Name, sum(Amount) FROM Serving CLIENTS c, SALES s WHERE c.CID = s.CID AND CID = ‘U20’; Partition U10 – U35 Partition U36 – U60 Table CLIENTS Table CLIENTS CID Name CID Name U20 Doug U40 John U21 Ted Table SALES Table SALES SID CID Amount SID CID Amount S100 U20 102 S223 U40 99 S101 U20 60
  • 7. For key = ‘U40’, tablespace=‘CLIENTS_INFO’ SELECT Name, sum(Amount) FROM Serving CLIENTS c, SALES s WHERE c.CID = s.CID AND CID = ‘U40’; Partition U10 – U35 Partition U36 – U60 Table CLIENTS Table CLIENTS CID Name CID Name U20 Doug U40 John U21 Ted Table SALES Table SALES SID CID Amount SID CID Amount S100 U20 102 S223 U40 99 S101 U20 60
  • 8. Why does it scale? Data is partitioned Partitions are distributed across nodes Adding more nodes increases capacity Queries restricted to a single partition Generation does not impact serving
  • 9. Ok, so what is Splout SQL useful for?
  • 10. Big Data Analytics Manageable output
  • 11. Big Data Analytics Sometimes Big Data output is also Big Data
  • 12. Splout SQL allows to serve Big Data results
  • 13. Let’s see an example …
  • 14. Building a Google Analytics Imagine that one crazy day you decide to build some kind of Google Analytics… Zillions of events Millions of domains Individual panel per domain
  • 15. Requirements Time-based charts (day/hour aggregations) Flexible dimension breakdown Per page, per browser Per country, per language …
  • 17. Splout SQL provides SQL consolidated views for Hadoop data
  • 18. Let’s see more details about Splout SQL
  • 20. Each partition is … Backed by SQLite Generated on Hadoop Including any indexes needed Data can be sorted before insertion to minimize disk seeks at query time Pre-sampling for balancing partition size Distributed on Splout SQL cluster With replication for failover
  • 21. Atomicity A tablespace is a set of tables that share the same partitioning schema Tablespaces are versioned Only one version served at a time Several tablespaces can be deployed at once All-or-nothing semantics (atomicity) Rollback support
  • 22. Characteristics Ensured ms latencies Even when queries hit disk Controlled by the developer selecting the proper: - Cluster topology - Partitioning - Indexes - Data collocation (insertion order)
  • 23. Characteristics (II) 100% SQL But restricted to a single partition Real-time aggregations Joins Scalability In data capacity In performance
  • 24. Characteristics (III) Atomicity New data replaces old data all at once High availability Through the use of replication Open Source
  • 25. Characteristics (IV) Easy to manage Changing the size of the cluster can be done without any downtime Read only Data is updated in batches Updates come from new tablespace deployments
  • 26. Characteristics (V) Native connectors Hive Pig Cascading
  • 27. API - Generation Command line Loading CSV files $ hadoop jar splout-*-hadoop.jar generate … Java API Connectors
  • 28. API - Service Rest API JSON response
  • 30. Benchmark 350 GB Wikipedia logs Aggregation queries impacting 15 rows in average 2-machines cluster 900 queries/second, 80 ms/query, 80 threads
  • 31. Benchmark (II) 4-machines cluster 3150 queries/second, 40 ms/query, 160 threads More info: http://sploutsql.com/performance.html
  • 32. Web latency SQL Consolidated Views For Hadoop “A good candidate for the serving layer of a lambda architecture”
  • 33. www.SploutCloud.com - Splout SQL as a service
  • 34. Future work Growing the community Do you want to collaborate?  Automatic rebalancing on failover Almost done Some read/write capabilities Enabling Splout SQL to become the speed layer on lambda architectures
  • 35. Iván de Prado Alonso – CEO of Datasalt www.datasalt.es @ivanprado @datasalt Questions?