SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Apache Drill
             Design proposal from
              OpenDremel team


Camuel Gilyadov & Constantine Peresypkin,
Email: Camuel@BigDataCraft.com
OpenDremel Story: 2010

• Camuel Gilyadov started Dremel implementation on
  summer 2010 named OpenDremel.
• David Gruzman joined the effort a few months later
  followed by Constantine Peresypkin.
• There wasn’t a comprehensive design or architecture.
  The goal was to get hierarchal-columnar transformation
  working smoothly and in strict accordance to the
  Dremel paper. Several working implementations are
  published by us under Apache License.
• Hong San was hired as first full-timer to speedup the
  development. Metaxa milestone was set.
OpenDremel Story: 2011
• OpenDremel early design was found too naive, mainly due to
  Java underperformance in inner number-crunching loops.
• After fierce brainstorming, project was restarted from scratch
  under new name Dazo. With Dazo, query plan is an arbitrary
  piece of executable native code with Java frontend.
• From now on we got inspiration from BigQuery as opposed to
  from Dremel paper.
• We decided to use Google NaCl as sandboxing technology to
  isolate queries as well as meter resource consumption. The new
  sandbox was named ZeroVM.
• As for storage we decided to use OpenStack Swift.
OpenDremel Story: 2012

• Four people full-time, several others part time, we still
  don’t have fully integrated version but we are satisfied
  with what we have achieved and convinced that the
  decisions behind Dazo were correct.
• We believe ZeroVM could be a disruptive technology in
  itself revolutionizing BigData@Cloud space.
• We are excited by Apache Drill initiative and hope to be
  useful for it.
Design Tenet #1

• Apache Drill must support multi-tenant semantics
  internally and not to be run in guest VMs altogether.
• It should be inspired by BigQuery and not only by
  Dremel/PowerDrill/Tenzing papers.
• It is not practical to setup a dedicated cloud (billed
  hourly) just to be able to run a query for a few seconds.
• The codebase must be clearly divided into trusted part
  and untrusted part. Trusted part must be kept to
  absolute minimum and must be peer-reviewed, secured,
  audited and metered.
Design Tenet #2

• Apache Drill must be extremely flexible and
  customizable.
• Schema-on-read concept must be supported.
  Imperative high-performance parser code must be
  possible to be embedded into the query.
• SQL is no longer enough. New query languages must
  be easily added as plug-ins or as user-defined-functions
  (UDF).
• Additionally various data-formats must be supported
  like column-stores, row-stores, PAX, RCFiles and etc.
Design Tenet #2 (cont.)

• We suggest that query plan format will be relaxed to
  arbitrary distributed executable code and data
  format relaxed to arbitrary opaque BLOB.
• This way new query languages and new data formats
  could be easily supported without changing backend.
• As added benefit backend becomes generic lightweight
  homogeneous compute-storage cloud.
• Such approach exhibits good separation of control.
  Cloud operator controls an bills for generic
  infrastructure and the query engine is left completely in
  the control of the tenant/user.
Design Tenet #3

• Apache Drill requests/queries must be hyper-elastic
  meaning capability to exploit compute capacity of
  thousands of servers for short duration of just a few
  seconds. No resources must be kept spinning per user
  between queries or when idle.
• Traditional VMs are too heavyweight for that.
  Container approach such as OpenVZ/LXC and etc. are
  not secure enough in multi-tenancy context.
• We suggest making sandboxing pluggable and
  supporting ZeroVM ( developed for OpenDremel ) and
  LXC (is fine for private clouds) to begin with.
Design Tenet #4

•   Apache Drill must be efficient.
•   Value-per-byte is extremely low with BigData.
•   Overhead in the inner loop must be kept to minimum.
•   Java was found inefficient for general number
    crunching (such as data compression). The main
    problem with Java is that GC overhead is unavoidable
    for the whole data corpus being scanned. We went so
    far as to keep all data in byte arrays and auto-generate
    transformation code and it still underperformed and
    code complexity went through the roof.
Suggested Architecture
Browser / Client    Single-Tenant            Multi-Tenant
                      Frontend                Backend
                      running inside      scale-out object store
                   traditional guest VM    and in-situ compute


                         JVM


  Query                  Query
                        Compiler




    Custom
 executable job
OpenDremel/Dazo
   Two separate       We call it Metaxa         We call it Zwift
 unfinished jQuery     (historic reasons)      (Swift + ZeroVM)
apps & cmdline app    BQL Parser, unfinished
 with no particular    compiler based on         Alpha Quality
    codenames           Apache Velocity


                             JVM


  Query                      Query
                            Compiler




     Custom
  executable job
What is Swift?




“Swift is a highly available, distributed,
eventually consistent object/blob store.
Organizations can use Swift to store
lots of data efficiently, safely, and
cheaply.”
Haven’t got it?



Swift is THE open-source
   implementation of
        Amazon S3
What is ZeroVM?




Highly-secure, low-overhead, low-latency container-style
virtualization based on Google Native Client project. The
critical security code is transferred verbatim from Chrome
Browser project and therefore is as secure as Chrome
Browser. More info: http://ZeroVM.org and
http://news.ycombinator.com/item?id=3746222
ZeroVM highlights

1.   Disposable VM per request
2.   HyperElasticity per request
3.   Embeddable into everything
4.   High-performance (x86/ARM)
5.   Erlang inspired clustering
6.   Written in pure C, not deps
Haven’t got it?


ZeroVM to Virtualization
        is what
SQLite is to Databases
Where is the code?

• OpenDremel (1st generation design):
   – http://code.google.com/p/dremel/source/browse?repo=dremel
   – http://code.google.com/p/dremel/source/browse?repo=metaxa

• Dazo (2nd generation design):
   – https://github.com/Dazo-org
Thanks
Camuel Gilyadov,
Email: Camuel@BigDataCraft.com

Contenu connexe

Tendances

Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedInGuozhang Wang
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardMatthew Blair
 
[Hadoop Meetup] Yarn at Microsoft - The challenges of scale
[Hadoop Meetup] Yarn at Microsoft - The challenges of scale[Hadoop Meetup] Yarn at Microsoft - The challenges of scale
[Hadoop Meetup] Yarn at Microsoft - The challenges of scaleNewton Alex
 
Bn 1016 demo postgre sql-online-training
Bn 1016 demo  postgre sql-online-trainingBn 1016 demo  postgre sql-online-training
Bn 1016 demo postgre sql-online-trainingconline training
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...Yahoo Developer Network
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeMichael Stack
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketCloudera, Inc.
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil GovindanNewton Alex
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to KafkaAkash Vacher
 
Kafka website activity architecture
Kafka website activity architectureKafka website activity architecture
Kafka website activity architectureOmid Vahdaty
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiMichael Stack
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructuremattlieber
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...DataStax Academy
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopDataWorks Summit
 

Tendances (20)

Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
[Hadoop Meetup] Yarn at Microsoft - The challenges of scale
[Hadoop Meetup] Yarn at Microsoft - The challenges of scale[Hadoop Meetup] Yarn at Microsoft - The challenges of scale
[Hadoop Meetup] Yarn at Microsoft - The challenges of scale
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
 
Bn 1016 demo postgre sql-online-training
Bn 1016 demo  postgre sql-online-trainingBn 1016 demo  postgre sql-online-training
Bn 1016 demo postgre sql-online-training
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
 
January 2011 HUG: Kafka Presentation
January 2011 HUG: Kafka PresentationJanuary 2011 HUG: Kafka Presentation
January 2011 HUG: Kafka Presentation
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
Kafka website activity architecture
Kafka website activity architectureKafka website activity architecture
Kafka website activity architecture
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at Xiaomi
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
 

En vedette

Big data presentation on Crystal Ball Event Prediction
Big data presentation on Crystal Ball Event PredictionBig data presentation on Crystal Ball Event Prediction
Big data presentation on Crystal Ball Event PredictionSujan Thapa
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldDez Blanchfield
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big DataRené Kuipers
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBernard Marr
 
Big Data
Big DataBig Data
Big DataNGDATA
 

En vedette (14)

Big data presentation on Crystal Ball Event Prediction
Big data presentation on Crystal Ball Event PredictionBig data presentation on Crystal Ball Event Prediction
Big data presentation on Crystal Ball Event Prediction
 
Big data(1st presentation)
Big data(1st presentation)Big data(1st presentation)
Big data(1st presentation)
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big Data
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
 
Big Idea For Big Data
Big Idea For Big DataBig Idea For Big Data
Big Idea For Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data
Big DataBig Data
Big Data
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similaire à Apache Drill (ver. 0.1, check ver. 0.2)

An introduction to Node.js
An introduction to Node.jsAn introduction to Node.js
An introduction to Node.jsKasey McCurdy
 
StorageOS, Storage for Containers Shouldn't Be Annoying at Container Camp UK
StorageOS, Storage for Containers Shouldn't Be Annoying at Container Camp UKStorageOS, Storage for Containers Shouldn't Be Annoying at Container Camp UK
StorageOS, Storage for Containers Shouldn't Be Annoying at Container Camp UKStorageOS
 
Quarkus - a next-generation Kubernetes Native Java framework
Quarkus - a next-generation Kubernetes Native Java frameworkQuarkus - a next-generation Kubernetes Native Java framework
Quarkus - a next-generation Kubernetes Native Java frameworkSVDevOps
 
Rami Sayar - Node microservices with Docker
Rami Sayar - Node microservices with DockerRami Sayar - Node microservices with Docker
Rami Sayar - Node microservices with DockerWeb à Québec
 
ThatConference 2016 - Highly Available Node.js
ThatConference 2016 - Highly Available Node.jsThatConference 2016 - Highly Available Node.js
ThatConference 2016 - Highly Available Node.jsBrad Williams
 
The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)Casey Bisson
 
Cloud Native Camel Riding
Cloud Native Camel RidingCloud Native Camel Riding
Cloud Native Camel RidingChristian Posta
 
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...NRB
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...NETWAYS
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterpriseBert Poller
 
Cloud for agile_sw_projects-final
Cloud for agile_sw_projects-finalCloud for agile_sw_projects-final
Cloud for agile_sw_projects-finalAlain Delafosse
 
Fuse integration-services
Fuse integration-servicesFuse integration-services
Fuse integration-servicesChristian Posta
 
The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)Simon Haslam
 

Similaire à Apache Drill (ver. 0.1, check ver. 0.2) (20)

Apache Drill (ver. 0.2)
Apache Drill (ver. 0.2)Apache Drill (ver. 0.2)
Apache Drill (ver. 0.2)
 
Migrating to Public Cloud
Migrating to Public CloudMigrating to Public Cloud
Migrating to Public Cloud
 
Cloud patterns
Cloud patternsCloud patterns
Cloud patterns
 
An introduction to Node.js
An introduction to Node.jsAn introduction to Node.js
An introduction to Node.js
 
StorageOS, Storage for Containers Shouldn't Be Annoying at Container Camp UK
StorageOS, Storage for Containers Shouldn't Be Annoying at Container Camp UKStorageOS, Storage for Containers Shouldn't Be Annoying at Container Camp UK
StorageOS, Storage for Containers Shouldn't Be Annoying at Container Camp UK
 
Quarkus - a next-generation Kubernetes Native Java framework
Quarkus - a next-generation Kubernetes Native Java frameworkQuarkus - a next-generation Kubernetes Native Java framework
Quarkus - a next-generation Kubernetes Native Java framework
 
Rami Sayar - Node microservices with Docker
Rami Sayar - Node microservices with DockerRami Sayar - Node microservices with Docker
Rami Sayar - Node microservices with Docker
 
ThatConference 2016 - Highly Available Node.js
ThatConference 2016 - Highly Available Node.jsThatConference 2016 - Highly Available Node.js
ThatConference 2016 - Highly Available Node.js
 
The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)
 
Docker-Intro
Docker-IntroDocker-Intro
Docker-Intro
 
Cloud Native Camel Riding
Cloud Native Camel RidingCloud Native Camel Riding
Cloud Native Camel Riding
 
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
56k.cloud training
56k.cloud training56k.cloud training
56k.cloud training
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
 
Intro to Sails.js
Intro to Sails.jsIntro to Sails.js
Intro to Sails.js
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
 
Cloud for agile_sw_projects-final
Cloud for agile_sw_projects-finalCloud for agile_sw_projects-final
Cloud for agile_sw_projects-final
 
Fuse integration-services
Fuse integration-servicesFuse integration-services
Fuse integration-services
 
The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)
 

Dernier

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Dernier (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Apache Drill (ver. 0.1, check ver. 0.2)

  • 1. Apache Drill Design proposal from OpenDremel team Camuel Gilyadov & Constantine Peresypkin, Email: Camuel@BigDataCraft.com
  • 2. OpenDremel Story: 2010 • Camuel Gilyadov started Dremel implementation on summer 2010 named OpenDremel. • David Gruzman joined the effort a few months later followed by Constantine Peresypkin. • There wasn’t a comprehensive design or architecture. The goal was to get hierarchal-columnar transformation working smoothly and in strict accordance to the Dremel paper. Several working implementations are published by us under Apache License. • Hong San was hired as first full-timer to speedup the development. Metaxa milestone was set.
  • 3. OpenDremel Story: 2011 • OpenDremel early design was found too naive, mainly due to Java underperformance in inner number-crunching loops. • After fierce brainstorming, project was restarted from scratch under new name Dazo. With Dazo, query plan is an arbitrary piece of executable native code with Java frontend. • From now on we got inspiration from BigQuery as opposed to from Dremel paper. • We decided to use Google NaCl as sandboxing technology to isolate queries as well as meter resource consumption. The new sandbox was named ZeroVM. • As for storage we decided to use OpenStack Swift.
  • 4. OpenDremel Story: 2012 • Four people full-time, several others part time, we still don’t have fully integrated version but we are satisfied with what we have achieved and convinced that the decisions behind Dazo were correct. • We believe ZeroVM could be a disruptive technology in itself revolutionizing BigData@Cloud space. • We are excited by Apache Drill initiative and hope to be useful for it.
  • 5. Design Tenet #1 • Apache Drill must support multi-tenant semantics internally and not to be run in guest VMs altogether. • It should be inspired by BigQuery and not only by Dremel/PowerDrill/Tenzing papers. • It is not practical to setup a dedicated cloud (billed hourly) just to be able to run a query for a few seconds. • The codebase must be clearly divided into trusted part and untrusted part. Trusted part must be kept to absolute minimum and must be peer-reviewed, secured, audited and metered.
  • 6. Design Tenet #2 • Apache Drill must be extremely flexible and customizable. • Schema-on-read concept must be supported. Imperative high-performance parser code must be possible to be embedded into the query. • SQL is no longer enough. New query languages must be easily added as plug-ins or as user-defined-functions (UDF). • Additionally various data-formats must be supported like column-stores, row-stores, PAX, RCFiles and etc.
  • 7. Design Tenet #2 (cont.) • We suggest that query plan format will be relaxed to arbitrary distributed executable code and data format relaxed to arbitrary opaque BLOB. • This way new query languages and new data formats could be easily supported without changing backend. • As added benefit backend becomes generic lightweight homogeneous compute-storage cloud. • Such approach exhibits good separation of control. Cloud operator controls an bills for generic infrastructure and the query engine is left completely in the control of the tenant/user.
  • 8. Design Tenet #3 • Apache Drill requests/queries must be hyper-elastic meaning capability to exploit compute capacity of thousands of servers for short duration of just a few seconds. No resources must be kept spinning per user between queries or when idle. • Traditional VMs are too heavyweight for that. Container approach such as OpenVZ/LXC and etc. are not secure enough in multi-tenancy context. • We suggest making sandboxing pluggable and supporting ZeroVM ( developed for OpenDremel ) and LXC (is fine for private clouds) to begin with.
  • 9. Design Tenet #4 • Apache Drill must be efficient. • Value-per-byte is extremely low with BigData. • Overhead in the inner loop must be kept to minimum. • Java was found inefficient for general number crunching (such as data compression). The main problem with Java is that GC overhead is unavoidable for the whole data corpus being scanned. We went so far as to keep all data in byte arrays and auto-generate transformation code and it still underperformed and code complexity went through the roof.
  • 10. Suggested Architecture Browser / Client Single-Tenant Multi-Tenant Frontend Backend running inside scale-out object store traditional guest VM and in-situ compute JVM Query Query Compiler Custom executable job
  • 11. OpenDremel/Dazo Two separate We call it Metaxa We call it Zwift unfinished jQuery (historic reasons) (Swift + ZeroVM) apps & cmdline app BQL Parser, unfinished with no particular compiler based on Alpha Quality codenames Apache Velocity JVM Query Query Compiler Custom executable job
  • 12. What is Swift? “Swift is a highly available, distributed, eventually consistent object/blob store. Organizations can use Swift to store lots of data efficiently, safely, and cheaply.”
  • 13. Haven’t got it? Swift is THE open-source implementation of Amazon S3
  • 14. What is ZeroVM? Highly-secure, low-overhead, low-latency container-style virtualization based on Google Native Client project. The critical security code is transferred verbatim from Chrome Browser project and therefore is as secure as Chrome Browser. More info: http://ZeroVM.org and http://news.ycombinator.com/item?id=3746222
  • 15. ZeroVM highlights 1. Disposable VM per request 2. HyperElasticity per request 3. Embeddable into everything 4. High-performance (x86/ARM) 5. Erlang inspired clustering 6. Written in pure C, not deps
  • 16. Haven’t got it? ZeroVM to Virtualization is what SQLite is to Databases
  • 17. Where is the code? • OpenDremel (1st generation design): – http://code.google.com/p/dremel/source/browse?repo=dremel – http://code.google.com/p/dremel/source/browse?repo=metaxa • Dazo (2nd generation design): – https://github.com/Dazo-org