SlideShare une entreprise Scribd logo
1  sur  30
Apache’s Answer to Low Latency
 Interactive Query for Big Data
         February 19, 2013




                                  1
Who am I?

http://www.mapr.com/company/events/nj
             h-2-19-2013
• Keys Botzum
• kbotzum@maprtech.com
• Senior Principal Technologist, MapR Technologies




                                                     2
Agenda

•   Apache Drill overview
•   Key features
•   Status and progress
•   Discuss potential use cases and cooperation




                                                  3
Big Data Workloads
•   ETL
•   Data mining
•   Blob store
•   Lightweight OLTP on large datasets
•   Index and model generation
•   Web crawling
•   Stream processing
•   Clustering, anomaly detection and classification
•   Interactive analysis


                                                       4
Example Problem
     • Jane works as an        Transaction
       analyst at an e-        information
       commerce company
     • How does she figure
                                 User
       out good targeting       profiles
       segments for the next
       marketing campaign?
     • She has some ideas        Access
       and lots of data           logs



                                           5
Solving the Problem with Traditional Systems

• Use an RDBMS
   – ETL the data from MongoDB and Hadoop into the RDBMS
       • MongoDB data must be flattened, schematized, filtered and aggregated
       • Hadoop data must be filtered and aggregated
   – Query the data using any SQL-based tool
• Use MapReduce
   – ETL the data from Oracle and MongoDB into Hadoop
   – Work with the MapReduce team to generate the desired analyses
• Use Hive
   – ETL the data from Oracle and MongoDB into Hadoop
       • MongoDB data must be flattened and schematized
   – But HiveQL is limited, queries take too long and BI tool support is
     limited
• Challenges: data movement, loss of nesting structure, latency


                                                                                6
WWGD

          Distributed              Interactive     Batch
                        NoSQL
          File System                analysis    processing


             GFS        BigTable    Dremel       MapReduce


                                                  Hadoop
            HDFS         HBase        ???
                                                 MapReduce




 Build Apache Drill to provide a true open source
    solution to interactive analysis of Big Data
                                                              8
Google Dremel
• Interactive analysis of large-scale datasets
   –   Trillion records at interactive speeds
   –   Complementary to MapReduce
   –   Used by thousands of Google employees
   –   Paper published at VLDB 2010
• Model
   – Nested data model with schema
      • Most data at Google is stored/transferred in Protocol Buffers
      • Normalization (to relational) is prohibitive
   – SQL-like query language with nested data support
• Implementation
   – Column-based storage and processing
   – In-situ data access (GFS and Bigtable)
   – Tree architecture as in Web search (and databases)
                                                                        9
Innovations
• MapReduce
  – Highly parallel algorithms running on commodity systems can deliver real
    value at reasonable cost
  – Scalable IO and compute trumps efficiency with today's commodity hardware
  – With many datasets, schemas and indexes are limiting
  – Flexibility is more important than efficiency
  – An easy, scalable, fault tolerant execution framework is key for large clusters
• Dremel
  –   Columnar storage provides significant performance benefits at scale
  –   Columnar storage with nesting preserves structure and can be very efficient
  –   Avoiding final record assembly as long as possible improves efficiency
  –   Optimizing for the query use case can avoid the full generality of MR and thus
      significantly reduce latency. E.g., no need to start JVMs, just push compact
      queries to running agents.


                                                                                       10
Apache Drill Overview
• Inspired by Google Dremel/BigQuery … more ambitious
• Interactive analysis of Big Data using standard SQL
• Fast
    – Low latency queries                                                Interactive queries
    – Columnar execution                                  Apache Drill   Data analyst
                                                                         Reporting
    – Complement native interfaces and MapReduce/Hive/Pig
                                                                         100 ms-20 min
• Open
    – Community driven open source project
    – Under Apache Software Foundation
• Modern
                                                                         Data mining
    –   Standard ANSI SQL:2003 (select/into)                MapReduce    Modeling
                                                                 Hive
    –   Nested/hierarchical data support                          Pig
                                                                         Large ETL
                                                                         20 min-20 hr
    –   Schema is optional
    –   Supports RDBMS, Hadoop and NoSQL
    –   Extensible

                                                                                          11
How Does It Work?
• Drillbits run on each node, designed to
  maximize data locality
• Processing is done outside MapReduce
                                                SELECT * FROM
  paradigm (but possibly within YARN)           oracle.transactions,
• Queries can be fed to any Drillbit            mongo.users,
                                                hdfs.events
• Coordination, query planning, optimization,   LIMIT 1
  scheduling, and execution are distributed




                                                                       12
Key Features

•   Full SQL (ANSI SQL:2003)
•   Nested data
•   Schema is optional
•   Flexible and extensible architecture




                                           13
Full SQL (ANSI SQL:2003)
• Drill supports standard ANSI SQL:2003
   – Correlated subqueries, analytic functions, …
   – SQL-like is not enough
• Use any SQL-based tool with Apache Drill
   – Tableau, Microstrategy, Excel, SAP Crystal Reports, Toad, SQuirreL, …
   – Standard ODBC and JDBC drivers
                          Client

            Tableau

                                                         Drillbit
          MicroStrategy
                          Drill%
                               ODBC%            SQL%Query%           Query%
                                       Driver                                  Drillbits
                            Driver                Parser            Planner   Drill%
                                                                                   Worker
              Excel
                                                                               Drill%Worker

           SAP%
              Crystal%
            Reports




                                                                                              14
Nested Data
                                                                   JSON
• Nested data is becoming prevalent                      {
                                                             "name": "Homer",
   – JSON, BSON, XML, Protocol Buffers, Avro, etc.           "gender": "Male",
                                                             "followers": 100
   – The data source may or may not be aware                 children: [
                                                               {name: "Bart"},
       • MongoDB supports nested data natively                 {name: "Lisa”}
                                                             ]
       • A single HBase value could be a JSON document   }
         (compound nested type)
   – Google Dremel’s innovation was efficient
                                                                   Avro
     columnar storage and querying of nested data        enum Gender {
• Flattening nested data is error-prone and              }
                                                           MALE, FEMALE

  very difficult                                         record User {
• Apache Drill supports nested data                        string name;
                                                           Gender gender;
   – Extensions to ANSI SQL:2003                         }
                                                           long followers;




                                                                                 15
Schema is Optional
• Many data sources do not have rigid schemas
     – Schemas change rapidly
     – Each record may have a different schema
          • Sparse and wide rows in HBase and Cassandra, MongoDB
• Apache Drill supports querying against unknown schemas
     – Query any HBase, Cassandra or MongoDB table
• User can define the schema or let the system discover it automatically
     – System of record may already have schema information
          • Why manage it in a separate system?
     – No need to manage schema evolution
Row Key             CF contents                 CF anchor
"com.cnn.www"       contents:html = "<html>…"   anchor:my.look.ca = "CNN.com"
                                                anchor:cnnsi.com = "CNN"
"com.foxnews.www"   contents:html = "<html>…"   anchor:en.wikipedia.org = "Fox News"

…                   …                           …


                                                                                       16
Flexible and Extensible Architecture
• Apache Drill is designed for extensibility
• Well-documented APIs and interfaces
• Data sources and file formats
    – Implement a custom scanner to support a new data source or file format
• Query languages
    – SQL:2003 is the primary language
    – Implement a custom Parser to support a Domain Specific Language

• Optimizers
    – Drill will have a cost-based optimizer
    – Clear surrounding APIs support easy optimizer exploration
• Operators
    – Custom operators can be implemented
         • Special operators for Mahout (k-means) being designed
    – Operator push-down to data source (RDBMS)



                                                                               17
Architecture



• Only the execution engine knows the physical attributes of the cluster
    – # nodes, hardware, file locations, …

• Public interfaces enable extensibility
    – Developers can build parsers for new query languages
    – Developers can provide an execution plan directly

• Each level of the plan has a human readable representation
    – Facilitates debugging and unit testing


                                                                           18
Status: In Progress
•   Heavy active development by multiple organizations
•   Available
     – Logical plan syntax and interpreter
     – Reference interpreter
•   In progress
     – SQL interpreter
     – Storage engine implementations for Accumulo, Cassandra, HBase and various file formats
•   Significant community momentum
     –   Over 200 people on the Drill mailing list
     –   Over 200 members of the Bay Area Drill User Group
     –   Drill meetups across the US and Europe
     –   OpenDremel team joined Apache Drill
•   Anticipated schedule:
     – Prototype: Q1
     – Alpha: Q2
     – Beta: Q3




                                                                                                19
Why Apache Drill Will Be Successful
Resources                      Community                    Architecture
• Contributors have strong     • Development done in the    • Full SQL
   backgrounds from              open                       • New data support
   companies like Oracle,      • Active contributors from   • Extensible APIs
   IBM Netezza, Informatica,     multiple companies         • Full Columnar Execution
   Clustrix and Pentaho        • Rapidly growing            • Beyond Hadoop




                                                                                        20
MapR’s Innovations for Hadoop


• NFS direct access
   • Makes Hadoop file system look like any file system
   • Simplifies access to data in a Hadoop cluster
   • Enables non-Hadoop programs access to the data – you
      know, the existing important applications you already
      have!
• Transparent compression
   • Saves space and thus $$$
• Web, command line, and REST based management tools
   • Reduces the burden on your admin teams

                                                              21
MapR’s Innovations for Hadoop


• Eliminates single points of failure
   • Self healing with automated stateful failover
• Protects your data
   • Snapshots for point-in-time data protection and recovery
   • Mirroring for business continuity includes wide area
      replication support
• More scalable
   • Central Name Node eliminated
   • Hundreds of billions of files/cluster – over a billion/node
   • File creation rates of over 1000/sec/node
                                                                   22
MapR’s Innovations for Hadoop


• Speeds jobs by up to 4X
   • 50% - 400% faster than other Hadoop distributions
     depending on benchmark and hardware
• Google and MapR demonstrated Terasort world record
   • http://www.mapr.com/mapr-google
• How did we do it?
   • Lots of C/C++ to avoid Java overhead
   • Raw disk IO
   • Application level NIC bonding
   • Numerous other optimizations in key components

                                                         23
MapR in the Cloud
   Available as a service with Google Compute Engine




• Available as a service with Amazon Elastic MapReduce (EMR)
   – http://aws.amazon.com/elasticmapreduce/mapr




                                                               24
Three Editions
• All Hadoop API compatible, majority open
  source components, full Hadoop stack
• M3
  – Faster, easier to use, better integration
• M5
  – Improving reliability and dependability
• M7
  – Hbase APIs on a more performant, scalable, and
    dependable platform (MapR Data Platform)

                                                     25
Questions?

• What problems can Drill solve for you?
• Where does it fit in the organization?
• Which data sources and BI tools are important
  to you?




                                                  26
References
• Google’s Dremel
   – http://research.google.com/pubs/pub36632.html
• Google’s BigQuery
   – https://developers.google.com/bigquery/docs/query-reference
• Microsoft’s Dryad
   – Distributed execution engine
   – http://research.microsoft.com/en-us/projects/dryad/
• MIT’s C-Store – a columnar database
   – http://db.csail.mit.edu/projects/cstore/
• Google’s Protobufs
   – https://developers.google.com/protocol-buffers/docs/proto

• How Apache projects work
   – http://www.apache.org/foundation/how-it-works.html

                                                                   27
Get Involved!
• Download these slides
   – http://www.mapr.com/company/events/njh-2-19-2013

• Apache Drill Project Information
   – http://www.mapr.com/drill
   – http://incubator.apache.org/drill
   – Join the mailing list and help: drill-dev-
     subscribe@incubator.apache.org

• Join MapR
   – jobs@mapr.com


                                                        28
APPENDIX


           29
How Does Impala Fit In?

Impala Strengths                                 Questions
•   Beta currently available                     •   Open Source ‘Lite’
•   Easy install and setup on top of             •   Doesn’t support RDBMS or other
    Cloudera                                         NoSQLs (beyond Hadoop/HBase)
•   Faster than Hive on some queries             •   Early row materialization increases
•   SQL-like query language                          footprint and reduces performance
                                                 •   Limited file format support
                                                 •   Query results must fit in memory!
                                                 •   Rigid schema is required
                                                 •   No support for nested data
                                                 •   Compound APIs restrict optimizer
                                                     progression
                                                 •   SQL-like (not SQL)


    Many important features are “coming soon”. Architectural foundation is constrained. No
                                  community development.

                                                                                             30
Why Not Leverage MapReduce?
• Scheduling Model
   – Coarse resource model reduces hardware utilization
   – Acquisition of resources typically takes 100’s of millis to seconds
• Barriers
   – Map completion required before shuffle/reduce
     commencement
   – All maps must complete before reduce can start
   – In chained jobs, one job must finish entirely before the next one
     can start
• Persistence and Recoverability
   – Data is persisted to disk between each barrier
   – Serialization and deserialization are required between execution
     phase


                                                                           31

Contenu connexe

Tendances

Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Cloudera, Inc.
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010Jonathan Seidman
 
Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Jonathan Seidman
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
Deploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopDeploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopGeorge Ang
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1Abbas Maazallahi
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architectureHarikrishnan K
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonCaserta
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Gavin Heavyside
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousingDataWorks Summit
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 

Tendances (20)

Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Deploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopDeploying Grid Services Using Hadoop
Deploying Grid Services Using Hadoop
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousing
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 

En vedette

USO DE LAS ESTRATEGIAS EXPOSITIVAS PARA EL DESARROLLO DE ACTITUDES Y VALORES
USO DE LAS ESTRATEGIAS EXPOSITIVAS PARA EL DESARROLLO DE ACTITUDES Y VALORESUSO DE LAS ESTRATEGIAS EXPOSITIVAS PARA EL DESARROLLO DE ACTITUDES Y VALORES
USO DE LAS ESTRATEGIAS EXPOSITIVAS PARA EL DESARROLLO DE ACTITUDES Y VALORESChus Fernández de la Fuente
 
Chicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted DunningChicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted DunningMapR Technologies
 
เรื่องที่ 2 แหล่งสารสนเทศ
เรื่องที่ 2 แหล่งสารสนเทศเรื่องที่ 2 แหล่งสารสนเทศ
เรื่องที่ 2 แหล่งสารสนเทศMarg Kok
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceMapR Technologies
 
The Last Traffic Jam - LatAm Spanish
The Last Traffic Jam - LatAm SpanishThe Last Traffic Jam - LatAm Spanish
The Last Traffic Jam - LatAm SpanishConnected Futures
 
Strata+Hadoop 2015 Keynote: Impacting Business as it Happens
Strata+Hadoop 2015 Keynote: Impacting Business as it HappensStrata+Hadoop 2015 Keynote: Impacting Business as it Happens
Strata+Hadoop 2015 Keynote: Impacting Business as it HappensMapR Technologies
 
La técnica expositiva
La técnica expositivaLa técnica expositiva
La técnica expositivaElmer Riveiro
 

En vedette (10)

Drill 1.0
Drill 1.0Drill 1.0
Drill 1.0
 
USO DE LAS ESTRATEGIAS EXPOSITIVAS PARA EL DESARROLLO DE ACTITUDES Y VALORES
USO DE LAS ESTRATEGIAS EXPOSITIVAS PARA EL DESARROLLO DE ACTITUDES Y VALORESUSO DE LAS ESTRATEGIAS EXPOSITIVAS PARA EL DESARROLLO DE ACTITUDES Y VALORES
USO DE LAS ESTRATEGIAS EXPOSITIVAS PARA EL DESARROLLO DE ACTITUDES Y VALORES
 
Chicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted DunningChicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted Dunning
 
Dunning ml-conf-2014
Dunning ml-conf-2014Dunning ml-conf-2014
Dunning ml-conf-2014
 
Atlhug 20150625
Atlhug 20150625Atlhug 20150625
Atlhug 20150625
 
เรื่องที่ 2 แหล่งสารสนเทศ
เรื่องที่ 2 แหล่งสารสนเทศเรื่องที่ 2 แหล่งสารสนเทศ
เรื่องที่ 2 แหล่งสารสนเทศ
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
 
The Last Traffic Jam - LatAm Spanish
The Last Traffic Jam - LatAm SpanishThe Last Traffic Jam - LatAm Spanish
The Last Traffic Jam - LatAm Spanish
 
Strata+Hadoop 2015 Keynote: Impacting Business as it Happens
Strata+Hadoop 2015 Keynote: Impacting Business as it HappensStrata+Hadoop 2015 Keynote: Impacting Business as it Happens
Strata+Hadoop 2015 Keynote: Impacting Business as it Happens
 
La técnica expositiva
La técnica expositivaLa técnica expositiva
La técnica expositiva
 

Similaire à Drill njhug -19 feb2013

An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillDataWorks Summit
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchMapR Technologies
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce introGeoff Hendrey
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoopGeoff Hendrey
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summitOpen Analytics
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Swiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillSwiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillMapR Technologies
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 

Similaire à Drill njhug -19 feb2013 (20)

Apache Drill
Apache DrillApache Drill
Apache Drill
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Apache drill
Apache drillApache drill
Apache drill
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoop
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Swiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillSwiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache Drill
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 

Plus de MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Plus de MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Dernier (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Drill njhug -19 feb2013

  • 1. Apache’s Answer to Low Latency Interactive Query for Big Data February 19, 2013 1
  • 2. Who am I? http://www.mapr.com/company/events/nj h-2-19-2013 • Keys Botzum • kbotzum@maprtech.com • Senior Principal Technologist, MapR Technologies 2
  • 3. Agenda • Apache Drill overview • Key features • Status and progress • Discuss potential use cases and cooperation 3
  • 4. Big Data Workloads • ETL • Data mining • Blob store • Lightweight OLTP on large datasets • Index and model generation • Web crawling • Stream processing • Clustering, anomaly detection and classification • Interactive analysis 4
  • 5. Example Problem • Jane works as an Transaction analyst at an e- information commerce company • How does she figure User out good targeting profiles segments for the next marketing campaign? • She has some ideas Access and lots of data logs 5
  • 6. Solving the Problem with Traditional Systems • Use an RDBMS – ETL the data from MongoDB and Hadoop into the RDBMS • MongoDB data must be flattened, schematized, filtered and aggregated • Hadoop data must be filtered and aggregated – Query the data using any SQL-based tool • Use MapReduce – ETL the data from Oracle and MongoDB into Hadoop – Work with the MapReduce team to generate the desired analyses • Use Hive – ETL the data from Oracle and MongoDB into Hadoop • MongoDB data must be flattened and schematized – But HiveQL is limited, queries take too long and BI tool support is limited • Challenges: data movement, loss of nesting structure, latency 6
  • 7. WWGD Distributed Interactive Batch NoSQL File System analysis processing GFS BigTable Dremel MapReduce Hadoop HDFS HBase ??? MapReduce Build Apache Drill to provide a true open source solution to interactive analysis of Big Data 8
  • 8. Google Dremel • Interactive analysis of large-scale datasets – Trillion records at interactive speeds – Complementary to MapReduce – Used by thousands of Google employees – Paper published at VLDB 2010 • Model – Nested data model with schema • Most data at Google is stored/transferred in Protocol Buffers • Normalization (to relational) is prohibitive – SQL-like query language with nested data support • Implementation – Column-based storage and processing – In-situ data access (GFS and Bigtable) – Tree architecture as in Web search (and databases) 9
  • 9. Innovations • MapReduce – Highly parallel algorithms running on commodity systems can deliver real value at reasonable cost – Scalable IO and compute trumps efficiency with today's commodity hardware – With many datasets, schemas and indexes are limiting – Flexibility is more important than efficiency – An easy, scalable, fault tolerant execution framework is key for large clusters • Dremel – Columnar storage provides significant performance benefits at scale – Columnar storage with nesting preserves structure and can be very efficient – Avoiding final record assembly as long as possible improves efficiency – Optimizing for the query use case can avoid the full generality of MR and thus significantly reduce latency. E.g., no need to start JVMs, just push compact queries to running agents. 10
  • 10. Apache Drill Overview • Inspired by Google Dremel/BigQuery … more ambitious • Interactive analysis of Big Data using standard SQL • Fast – Low latency queries Interactive queries – Columnar execution Apache Drill Data analyst Reporting – Complement native interfaces and MapReduce/Hive/Pig 100 ms-20 min • Open – Community driven open source project – Under Apache Software Foundation • Modern Data mining – Standard ANSI SQL:2003 (select/into) MapReduce Modeling Hive – Nested/hierarchical data support Pig Large ETL 20 min-20 hr – Schema is optional – Supports RDBMS, Hadoop and NoSQL – Extensible 11
  • 11. How Does It Work? • Drillbits run on each node, designed to maximize data locality • Processing is done outside MapReduce SELECT * FROM paradigm (but possibly within YARN) oracle.transactions, • Queries can be fed to any Drillbit mongo.users, hdfs.events • Coordination, query planning, optimization, LIMIT 1 scheduling, and execution are distributed 12
  • 12. Key Features • Full SQL (ANSI SQL:2003) • Nested data • Schema is optional • Flexible and extensible architecture 13
  • 13. Full SQL (ANSI SQL:2003) • Drill supports standard ANSI SQL:2003 – Correlated subqueries, analytic functions, … – SQL-like is not enough • Use any SQL-based tool with Apache Drill – Tableau, Microstrategy, Excel, SAP Crystal Reports, Toad, SQuirreL, … – Standard ODBC and JDBC drivers Client Tableau Drillbit MicroStrategy Drill% ODBC% SQL%Query% Query% Driver Drillbits Driver Parser Planner Drill% Worker Excel Drill%Worker SAP% Crystal% Reports 14
  • 14. Nested Data JSON • Nested data is becoming prevalent { "name": "Homer", – JSON, BSON, XML, Protocol Buffers, Avro, etc. "gender": "Male", "followers": 100 – The data source may or may not be aware children: [ {name: "Bart"}, • MongoDB supports nested data natively {name: "Lisa”} ] • A single HBase value could be a JSON document } (compound nested type) – Google Dremel’s innovation was efficient Avro columnar storage and querying of nested data enum Gender { • Flattening nested data is error-prone and } MALE, FEMALE very difficult record User { • Apache Drill supports nested data string name; Gender gender; – Extensions to ANSI SQL:2003 } long followers; 15
  • 15. Schema is Optional • Many data sources do not have rigid schemas – Schemas change rapidly – Each record may have a different schema • Sparse and wide rows in HBase and Cassandra, MongoDB • Apache Drill supports querying against unknown schemas – Query any HBase, Cassandra or MongoDB table • User can define the schema or let the system discover it automatically – System of record may already have schema information • Why manage it in a separate system? – No need to manage schema evolution Row Key CF contents CF anchor "com.cnn.www" contents:html = "<html>…" anchor:my.look.ca = "CNN.com" anchor:cnnsi.com = "CNN" "com.foxnews.www" contents:html = "<html>…" anchor:en.wikipedia.org = "Fox News" … … … 16
  • 16. Flexible and Extensible Architecture • Apache Drill is designed for extensibility • Well-documented APIs and interfaces • Data sources and file formats – Implement a custom scanner to support a new data source or file format • Query languages – SQL:2003 is the primary language – Implement a custom Parser to support a Domain Specific Language • Optimizers – Drill will have a cost-based optimizer – Clear surrounding APIs support easy optimizer exploration • Operators – Custom operators can be implemented • Special operators for Mahout (k-means) being designed – Operator push-down to data source (RDBMS) 17
  • 17. Architecture • Only the execution engine knows the physical attributes of the cluster – # nodes, hardware, file locations, … • Public interfaces enable extensibility – Developers can build parsers for new query languages – Developers can provide an execution plan directly • Each level of the plan has a human readable representation – Facilitates debugging and unit testing 18
  • 18. Status: In Progress • Heavy active development by multiple organizations • Available – Logical plan syntax and interpreter – Reference interpreter • In progress – SQL interpreter – Storage engine implementations for Accumulo, Cassandra, HBase and various file formats • Significant community momentum – Over 200 people on the Drill mailing list – Over 200 members of the Bay Area Drill User Group – Drill meetups across the US and Europe – OpenDremel team joined Apache Drill • Anticipated schedule: – Prototype: Q1 – Alpha: Q2 – Beta: Q3 19
  • 19. Why Apache Drill Will Be Successful Resources Community Architecture • Contributors have strong • Development done in the • Full SQL backgrounds from open • New data support companies like Oracle, • Active contributors from • Extensible APIs IBM Netezza, Informatica, multiple companies • Full Columnar Execution Clustrix and Pentaho • Rapidly growing • Beyond Hadoop 20
  • 20. MapR’s Innovations for Hadoop • NFS direct access • Makes Hadoop file system look like any file system • Simplifies access to data in a Hadoop cluster • Enables non-Hadoop programs access to the data – you know, the existing important applications you already have! • Transparent compression • Saves space and thus $$$ • Web, command line, and REST based management tools • Reduces the burden on your admin teams 21
  • 21. MapR’s Innovations for Hadoop • Eliminates single points of failure • Self healing with automated stateful failover • Protects your data • Snapshots for point-in-time data protection and recovery • Mirroring for business continuity includes wide area replication support • More scalable • Central Name Node eliminated • Hundreds of billions of files/cluster – over a billion/node • File creation rates of over 1000/sec/node 22
  • 22. MapR’s Innovations for Hadoop • Speeds jobs by up to 4X • 50% - 400% faster than other Hadoop distributions depending on benchmark and hardware • Google and MapR demonstrated Terasort world record • http://www.mapr.com/mapr-google • How did we do it? • Lots of C/C++ to avoid Java overhead • Raw disk IO • Application level NIC bonding • Numerous other optimizations in key components 23
  • 23. MapR in the Cloud  Available as a service with Google Compute Engine • Available as a service with Amazon Elastic MapReduce (EMR) – http://aws.amazon.com/elasticmapreduce/mapr 24
  • 24. Three Editions • All Hadoop API compatible, majority open source components, full Hadoop stack • M3 – Faster, easier to use, better integration • M5 – Improving reliability and dependability • M7 – Hbase APIs on a more performant, scalable, and dependable platform (MapR Data Platform) 25
  • 25. Questions? • What problems can Drill solve for you? • Where does it fit in the organization? • Which data sources and BI tools are important to you? 26
  • 26. References • Google’s Dremel – http://research.google.com/pubs/pub36632.html • Google’s BigQuery – https://developers.google.com/bigquery/docs/query-reference • Microsoft’s Dryad – Distributed execution engine – http://research.microsoft.com/en-us/projects/dryad/ • MIT’s C-Store – a columnar database – http://db.csail.mit.edu/projects/cstore/ • Google’s Protobufs – https://developers.google.com/protocol-buffers/docs/proto • How Apache projects work – http://www.apache.org/foundation/how-it-works.html 27
  • 27. Get Involved! • Download these slides – http://www.mapr.com/company/events/njh-2-19-2013 • Apache Drill Project Information – http://www.mapr.com/drill – http://incubator.apache.org/drill – Join the mailing list and help: drill-dev- subscribe@incubator.apache.org • Join MapR – jobs@mapr.com 28
  • 28. APPENDIX 29
  • 29. How Does Impala Fit In? Impala Strengths Questions • Beta currently available • Open Source ‘Lite’ • Easy install and setup on top of • Doesn’t support RDBMS or other Cloudera NoSQLs (beyond Hadoop/HBase) • Faster than Hive on some queries • Early row materialization increases • SQL-like query language footprint and reduces performance • Limited file format support • Query results must fit in memory! • Rigid schema is required • No support for nested data • Compound APIs restrict optimizer progression • SQL-like (not SQL) Many important features are “coming soon”. Architectural foundation is constrained. No community development. 30
  • 30. Why Not Leverage MapReduce? • Scheduling Model – Coarse resource model reduces hardware utilization – Acquisition of resources typically takes 100’s of millis to seconds • Barriers – Map completion required before shuffle/reduce commencement – All maps must complete before reduce can start – In chained jobs, one job must finish entirely before the next one can start • Persistence and Recoverability – Data is persisted to disk between each barrier – Serialization and deserialization are required between execution phase 31

Notes de l'éditeur

  1. With the recent explosion of everything related to Hadoop, it is no surprise that new projects/implementations related to the Hadoop ecosystem keep appearing. There have been quite a few initiatives that provide SQL interfaces into Hadoop. The Apache Drill project is a distributed system for interactive analysis of large-scale datasets, inspired by Google&apos;s Dremel. Drill is not trying to replace existing Big Data batch processing frameworks, such as Hadoop MapReduce or stream processing frameworks, such as S4 or Storm. It rather fills the existing void – real-time interactive processing of large data sets.------------------------------Technical DetailSimilar to Dremel, the Drill implementation is based on the processing of nested, tree-like data. In Dremel this data is based on protocol buffers – nested schema-based data model. Drill is planning to extend this data model by adding additional schema-based implementations, for example, Apache Avro and schema-less data models such asJSON and BSON. In addition to a single data structure, Drill is also planning to support “baby joins” – joins to the small, loadable in memory, data structures.