SlideShare une entreprise Scribd logo
1  sur  25
Using Standard File-Based
                        Applications and SQL-Based
                                 Tools with Hadoop
©MapR Technologies - Confidential   1
http://info.mapr.com/HUG-7-2012

     Tomer Shiran
     tshiran@maprtech.com
     Director of Product Management, MapR Technologies




©MapR Technologies - Confidential   2
The MapR Distribution for Apache Hadoop

     The open, enterprise-grade distribution for Apache Hadoop
       –   Open source components
            •    Hive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, …
       –   Enhancements to make Hadoop more open and enterprise-grade


     Fastest growing distribution
       –   Thousands of clusters deployed


     Now available as a service with Amazon Elastic MapReduce (EMR)
       –   http://aws.amazon.com/elasticmapreduce/mapr




©MapR Technologies - Confidential                   3
MapR




         Make Hadoop                            Make Hadoop
          more open                            enterprise-grade



                       This presentation

©MapR Technologies - Confidential          4
Not All Applications Use the Hadoop APIs


                                           Applications and libraries
                                           that use files and/or SQL


                         30 years
                  100,000s applications
                     10,000s libraries
               10s programming languages


                                           Applications and libraries
                                           that use the Hadoop APIs

©MapR Technologies - Confidential      5
Hadoop Needs Industry-Standard Interfaces


               Hadoop               • MapReduce and HBase applications
                 API                • Mostly custom-built



                                    • File-based applications
                           NFS      • Supported by most operating systems


                                    • SQL-based tools
                     ODBC           • Supported by most BI applications and
                                      query builders

©MapR Technologies - Confidential           6
NFS


©MapR Technologies - Confidential   7
Your Data is Your Data

     HDFS-based Hadoop distributions do not (cannot)
      support NFS

     Your data is your data – make sure you can access it
       – Why store your data in a system which cannot be accessed
           by 95% of the world’s applications and libraries?


     Access to HDFS source code != access to your data



©MapR Technologies - Confidential      8
The NFS Protocol

     RFC 1813                            WRITE3res NFSPROC3_WRITE(WRITE3args) = 7;

                                          struct WRITE3args {
                                              nfs_fh3     file;
     Very simple protocol                    offset3     offset;
                                              count3      count;
                                              stable_how stable;
     Random reads/writes                     opaque      data<>;
       –   Read count bytes from          };
           offset offset of file file
       –   Write buffer data to           READ3res NFSPROC3_READ(READ3args) = 6;
           offset offset of a file file
                                          struct READ3args {
                                              nfs_fh3 file;
                                              offset3 offset;
     HDFS does not support                   count3   count;
      random writes so it                 };
      cannot support NFS

©MapR Technologies - Confidential               9
S3
                                    o.a.h.fs.s3native.NativeS3FileSystem




©MapR Technologies - Confidential
                                                HDFS
                                      o.a.h.hdfs.DistributedFileSystem
                                                                                                             Storage Layers



                                     Local File System
                                          o.a.h.fs.LocalFileSystem
                                                                                      MapReduce




                                                  FTP
                                         o.a.h.fs.ftp.FTPFileSystem




10
                                    MapR storage layer
                                                                             o.a.h.fs.FileSystem Interface




                                       com.mapr.fs.MapRFileSystem
                                                                                                             Hadoop Was Designed to Support Multiple
                                                                                             Hadoop




                                                             NFS interface
                                                                                             FileSystem API
One NFS Gateway




©MapR Technologies - Confidential   11
Multiple NFS Gateways




©MapR Technologies - Confidential   12
Multiple NFS Gateways with Load Balancing




©MapR Technologies - Confidential   13
Multiple NFS Gateways with NFS HA (VIPs)




©MapR Technologies - Confidential   14
Customer Examples: Import/Export Data

     Network security vendor
       –   Network packet captures from switches are streamed into the cluster
       –   New pattern definitions are loaded into online IPS via NFS


     Online measurement company
       –   Clickstreams from application servers are streamed into the cluster


     SaaS company
       –   Exporting a database to Hadoop over NFS


     Ad exchange
       –   Bids and transactions are streamed into the cluster
©MapR Technologies - Confidential            15
Customer Examples: Productivity and Operations

     Retailer
       –   Operational scripts are easier with NFS than DFS + MapReduce
            •    chmod/chown, file system searches/greps, make, tab-complete
       –   Consolidate object store with analytics

     Credit card company
       –   User and project home directories on Linux gateways
            •    Local files, scripts, source code, …
            •    Administrators manage quotas, snapshots/backups, …


     Large Internet company
       –   Web server serve MapReduce results (item relationships) directly from cluster

     Email marketing company
       –   Object store with HBase and NFS

©MapR Technologies - Confidential                   16
ODBC


©MapR Technologies - Confidential   17
ODBC

     ODBC – Open DataBase Connectivity
       – Open standard API for accessing a SQL-based backend
       – Developed by Microsoft and Simba Technologies in 1992



     Flagship API for SQL-based BI and reporting
       –   Excel, Tableau, MicroStrategy, Crystal Reports, …


     Advanced ODBC drivers use the latest 3.52 specification




©MapR Technologies - Confidential                 18
MapR ODBC Driver

     MapR provides a Hive ODBC 3.52 driver
       – Developed in partnership with ODBC inventor Simba Technologies
       – Compliant with latest ODBC 3.52 specification
         • 32- and 64-bit platform support
            •    Windows and Linux


     Enables direct SQL access to MapR-stored data by translating SQL to
      HiveQL

     SQLizer enables seamless connectivity
       – Provides ANSI SQL-92 front-end
       – Targeted for existing apps that generate standard SQL queries
       – Transforms SQL query into HiveQL query



©MapR Technologies - Confidential              19
Example: Tableau




©MapR Technologies - Confidential   20
Example: Tableau




©MapR Technologies - Confidential   21
Example: Open source query builder (Kaimon)




©MapR Technologies - Confidential   22
Example: Microsoft Excel




©MapR Technologies - Confidential   23
Join MapR

     Join the fastest growing Hadoop company


     Open positions in every discipline
       –   Engineers
       –   Solution Architects
       –   Product Management


     Email jobs@mapr.com




©MapR Technologies - Confidential     24
Time for Questions

     Download slides or send me an email
       –   http://info.mapr.com/HUG-7-2012


     Download MapR to learn more
       –   www.mapr.com/download




©MapR Technologies - Confidential            25

Contenu connexe

Tendances

HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitterctrezzo
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
 
MapReduce
MapReduceMapReduce
MapReduceKavyaGo
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranMapR Technologies
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoopabord
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemSteve Loughran
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesKelly Technologies
 
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionHadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionBig Data Joe™ Rossi
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFSKavyaGo
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationSameer Tiwari
 

Tendances (20)

HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 
HBase with MapR
HBase with MapRHBase with MapR
HBase with MapR
 
MapReduce
MapReduceMapReduce
MapReduce
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoop
 
10c introduction
10c introduction10c introduction
10c introduction
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionHadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
 

Similaire à July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellKhalid Imran
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoopveeracynixit
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoopveeracynixit
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapakapa rohit
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologiesinside-BigData.com
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopSpagoWorld
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeMapR Technologies
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeDataWorks Summit
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...AyeeshaParveen
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 

Similaire à July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop (20)

Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
 
In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from Hadoop
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 

Plus de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Plus de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Dernier

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Dernier (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

  • 1. Using Standard File-Based Applications and SQL-Based Tools with Hadoop ©MapR Technologies - Confidential 1
  • 2. http://info.mapr.com/HUG-7-2012  Tomer Shiran  tshiran@maprtech.com  Director of Product Management, MapR Technologies ©MapR Technologies - Confidential 2
  • 3. The MapR Distribution for Apache Hadoop  The open, enterprise-grade distribution for Apache Hadoop – Open source components • Hive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, … – Enhancements to make Hadoop more open and enterprise-grade  Fastest growing distribution – Thousands of clusters deployed  Now available as a service with Amazon Elastic MapReduce (EMR) – http://aws.amazon.com/elasticmapreduce/mapr ©MapR Technologies - Confidential 3
  • 4. MapR Make Hadoop Make Hadoop more open enterprise-grade This presentation ©MapR Technologies - Confidential 4
  • 5. Not All Applications Use the Hadoop APIs Applications and libraries that use files and/or SQL 30 years 100,000s applications 10,000s libraries 10s programming languages Applications and libraries that use the Hadoop APIs ©MapR Technologies - Confidential 5
  • 6. Hadoop Needs Industry-Standard Interfaces Hadoop • MapReduce and HBase applications API • Mostly custom-built • File-based applications NFS • Supported by most operating systems • SQL-based tools ODBC • Supported by most BI applications and query builders ©MapR Technologies - Confidential 6
  • 7. NFS ©MapR Technologies - Confidential 7
  • 8. Your Data is Your Data  HDFS-based Hadoop distributions do not (cannot) support NFS  Your data is your data – make sure you can access it – Why store your data in a system which cannot be accessed by 95% of the world’s applications and libraries?  Access to HDFS source code != access to your data ©MapR Technologies - Confidential 8
  • 9. The NFS Protocol  RFC 1813 WRITE3res NFSPROC3_WRITE(WRITE3args) = 7; struct WRITE3args { nfs_fh3 file;  Very simple protocol offset3 offset; count3 count; stable_how stable;  Random reads/writes opaque data<>; – Read count bytes from }; offset offset of file file – Write buffer data to READ3res NFSPROC3_READ(READ3args) = 6; offset offset of a file file struct READ3args { nfs_fh3 file; offset3 offset;  HDFS does not support count3 count; random writes so it }; cannot support NFS ©MapR Technologies - Confidential 9
  • 10. S3 o.a.h.fs.s3native.NativeS3FileSystem ©MapR Technologies - Confidential HDFS o.a.h.hdfs.DistributedFileSystem Storage Layers Local File System o.a.h.fs.LocalFileSystem MapReduce FTP o.a.h.fs.ftp.FTPFileSystem 10 MapR storage layer o.a.h.fs.FileSystem Interface com.mapr.fs.MapRFileSystem Hadoop Was Designed to Support Multiple Hadoop NFS interface FileSystem API
  • 11. One NFS Gateway ©MapR Technologies - Confidential 11
  • 12. Multiple NFS Gateways ©MapR Technologies - Confidential 12
  • 13. Multiple NFS Gateways with Load Balancing ©MapR Technologies - Confidential 13
  • 14. Multiple NFS Gateways with NFS HA (VIPs) ©MapR Technologies - Confidential 14
  • 15. Customer Examples: Import/Export Data  Network security vendor – Network packet captures from switches are streamed into the cluster – New pattern definitions are loaded into online IPS via NFS  Online measurement company – Clickstreams from application servers are streamed into the cluster  SaaS company – Exporting a database to Hadoop over NFS  Ad exchange – Bids and transactions are streamed into the cluster ©MapR Technologies - Confidential 15
  • 16. Customer Examples: Productivity and Operations  Retailer – Operational scripts are easier with NFS than DFS + MapReduce • chmod/chown, file system searches/greps, make, tab-complete – Consolidate object store with analytics  Credit card company – User and project home directories on Linux gateways • Local files, scripts, source code, … • Administrators manage quotas, snapshots/backups, …  Large Internet company – Web server serve MapReduce results (item relationships) directly from cluster  Email marketing company – Object store with HBase and NFS ©MapR Technologies - Confidential 16
  • 17. ODBC ©MapR Technologies - Confidential 17
  • 18. ODBC  ODBC – Open DataBase Connectivity – Open standard API for accessing a SQL-based backend – Developed by Microsoft and Simba Technologies in 1992  Flagship API for SQL-based BI and reporting – Excel, Tableau, MicroStrategy, Crystal Reports, …  Advanced ODBC drivers use the latest 3.52 specification ©MapR Technologies - Confidential 18
  • 19. MapR ODBC Driver  MapR provides a Hive ODBC 3.52 driver – Developed in partnership with ODBC inventor Simba Technologies – Compliant with latest ODBC 3.52 specification • 32- and 64-bit platform support • Windows and Linux  Enables direct SQL access to MapR-stored data by translating SQL to HiveQL  SQLizer enables seamless connectivity – Provides ANSI SQL-92 front-end – Targeted for existing apps that generate standard SQL queries – Transforms SQL query into HiveQL query ©MapR Technologies - Confidential 19
  • 22. Example: Open source query builder (Kaimon) ©MapR Technologies - Confidential 22
  • 23. Example: Microsoft Excel ©MapR Technologies - Confidential 23
  • 24. Join MapR  Join the fastest growing Hadoop company  Open positions in every discipline – Engineers – Solution Architects – Product Management  Email jobs@mapr.com ©MapR Technologies - Confidential 24
  • 25. Time for Questions  Download slides or send me an email – http://info.mapr.com/HUG-7-2012  Download MapR to learn more – www.mapr.com/download ©MapR Technologies - Confidential 25