SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Using	
  Standard	
  File-­‐Based	
  
                       Applica4ons	
  and	
  SQL-­‐Based	
  
                                  Tools	
  with	
  Hadoop	
  
©MapR	
  Technologies	
                  1	
  
Who	
  am	
  I?	
  

      hBp://www.mapr.com/company/events/
             speaking/dc-­‐hug-­‐9-­‐18-­‐12	
  
§    Keys	
  Botzum	
  
§    kbotzum@maprtech.com	
  
§    Senior	
  Principal	
  Technologist,	
  MapR	
  Technologies	
  




©MapR	
  Technologies	
                        2	
  
The	
  MapR	
  Distribu4on	
  for	
  Apache	
  Hadoop	
  

§    The	
  open,	
  enterprise-­‐grade	
  distribuLon	
  for	
  Apache	
  Hadoop	
  
      –  Open	
  source	
  components	
  
            •    Hive,	
  Pig,	
  Cascading,	
  HBase,	
  ZooKeeper,	
  Oozie,	
  Flume,	
  Sqoop,	
  Whirr,	
  …	
  
      –  Enhancements	
  to	
  make	
  Hadoop	
  more	
  open	
  and	
  enterprise-­‐grade	
  



§    Growing	
  fast	
  and	
  a	
  recognized	
  leader	
  




©MapR	
  Technologies	
                                             3	
  
MapR	
  in	
  the	
  Cloud	
  
	
  
§       Available	
  as	
  a	
  service	
  with	
  Amazon	
  ElasLc	
  MapReduce	
  (EMR)	
  
         –  hBp://aws.amazon.com/elasLcmapreduce/mapr	
  




         	
  
   §           Available	
  as	
  a	
  service	
  with	
  Google	
  Compute	
  Engine	
  
         	
  




©MapR	
  Technologies	
                                     4	
  
MapR	
  




        Make	
  Hadoop	
                                 Make	
  Hadoop	
  
         more	
  open	
                                 enterprise-­‐grade	
  

                                                        •    High	
  Availability	
  
                                                        •    Scalability	
  
                                                        •    Management	
  tools	
  –	
  Web,	
  CLI,	
  REST	
  
                      This	
  presentaLon	
             •    Data	
  ProtecLon	
  –	
  snapshots	
  &	
  mirroring	
  
                                                        •    Performance	
  

©MapR	
  Technologies	
                         5	
  
Not	
  All	
  Applica4ons	
  Use	
  the	
  Hadoop	
  APIs	
  


                                                        ApplicaLons	
  and	
  
                                                        libraries	
  that	
  use	
  files	
  
                                                        and/or	
  SQL	
  
                                                        •  These	
  are	
  not	
  legacy	
  
                           30	
  years	
  
                                                           applicaLons,	
  they	
  are	
  
                 100,000s	
  applicaLons	
                 valuable	
  applicaLons	
  
                       10,000s	
  libraries	
  
              10s	
  programming	
  languages	
  
  	
  

                                                        ApplicaLons	
  and	
  
                                                        libraries	
  that	
  use	
  the	
  
                                                        Hadoop	
  APIs	
  	
  
©MapR	
  Technologies	
                         6	
  
Hadoop	
  Needs	
  Industry-­‐Standard	
  Interfaces	
  


              Hadoop	
                •  MapReduce	
  and	
  HBase	
  applicaLons	
  
                API	
                 •  Mostly	
  custom-­‐built	
  



                                      •  File-­‐based	
  applicaLons	
  
                            NFS	
     •  Supported	
  by	
  most	
  operaLng	
  systems	
  


                                      •  SQL-­‐based	
  tools	
  
                    ODBC	
            •  Supported	
  by	
  most	
  BI	
  applicaLons	
  and	
  
                                         query	
  builders	
  

©MapR	
  Technologies	
                           7	
  
NFS	
  


©MapR	
  Technologies	
     8	
  
Your	
  Data	
  is	
  Important	
  

§    HDFS-­‐based	
  Hadoop	
  distribuLons	
  do	
  not	
  (cannot)	
  
      properly	
  support	
  NFS	
  

§    Your	
  data	
  is	
  important,	
  it	
  drives	
  your	
  business	
  –	
  make	
  
      sure	
  you	
  can	
  access	
  it	
  
      –  Why	
  store	
  your	
  data	
  in	
  a	
  system	
  which	
  cannot	
  be	
  accessed	
  
           by	
  95%	
  of	
  the	
  world’s	
  applicaLons	
  and	
  libraries?	
  


§    Access	
  to	
  HDFS	
  source	
  code	
  !=	
  access	
  to	
  your	
  data	
  


©MapR	
  Technologies	
                              9	
  
The	
  NFS	
  Protocol	
  

§     RFC	
  1813	
                                    WRITE3res	
  NFSPROC3_WRITE(WRITE3args)	
  =	
  7;	
  
                                                        	
  
                                                        struct	
  WRITE3args	
  {	
  
                                                        	
  	
  	
  	
  nfs_fh3	
  	
  	
  	
  	
  file;	
  
§     Very	
  simple	
  protocol	
                     	
  	
  	
  	
  offset3	
  	
  	
  	
  	
  offset;	
  
                                                        	
  	
  	
  	
  count3	
  	
  	
  	
  	
  	
  count;	
  
                                                        	
  	
  	
  	
  stable_how	
  	
  stable;	
  
§     Random	
  reads/writes	
                         	
  	
  	
  	
  opaque	
  	
  	
  	
  	
  	
  data<>;	
  
       –  Read	
  count	
  bytes	
  from	
              };	
  
          offset	
  offset	
  of	
  file	
  file	
          	
  
                                                        READ3res	
  NFSPROC3_READ(READ3args)	
  =	
  6;	
  
       –  Write	
  buffer	
  data	
  to	
  	
  
                                                        	
  
          offset	
  offset	
  of	
  a	
  file	
  file	
  
                                                        struct	
  READ3args	
  {	
  
                                                        	
  	
  	
  	
  nfs_fh3	
  	
  file;	
  
                                                        	
  	
  	
  	
  offset3	
  	
  offset;	
  
§     HDFS	
  does	
  not	
  support	
                 	
  	
  	
  	
  count3	
  	
  	
  count;	
  
       random	
  writes	
  so	
  it	
                   };	
  
       cannot	
  support	
  NFS	
  
	
  
©MapR	
  Technologies	
                                         10	
  
S3	
  
                            o.a.h.fs.s3naLve.NaLveS3FileSystem	
  




©MapR	
  Technologies	
  
                                         HDFS	
  
                              o.a.h.hdfs.DistributedFileSystem	
  

                             Local	
  File	
  System	
  
                                                                                                                      Storage	
  Layers	
  



                                  o.a.h.fs.LocalFileSystem	
  
                                                                                        MapReduce	
  




                                           FTP	
  
                                 o.a.h.fs.qp.FTPFileSystem	
  




11	
  
                            MapR	
  storage	
  layer	
  
                                                                               o.a.h.fs.FileSystem	
  Interface	
  




                               com.mapr.fs.MapRFileSystem	
  
                                                                                                 Hadoop	
  
                                                                                                                      Hadoop	
  Was	
  Designed	
  to	
  Support	
  Mul4ple	
  




                                                        NFS	
  interface	
  
                                                                                                 FileSystem	
  API	
  
One	
  NFS	
  Gateway	
  




      What	
  about	
  scalability	
  and	
  high	
  availability?	
  
©MapR	
  Technologies	
                                   12	
  
Mul4ple	
  NFS	
  Gateways	
  




©MapR	
  Technologies	
     13	
  
Mul4ple	
  NFS	
  Gateways	
  with	
  Load	
  Balancing	
  




©MapR	
  Technologies	
      14	
  
Mul4ple	
  NFS	
  Gateways	
  with	
  NFS	
  HA	
  (VIPs)	
  




©MapR	
  Technologies	
        15	
  
Customer	
  Examples:	
  Import/Export	
  Data	
  

§    Network	
  security	
  vendor	
  
      –  Network	
  packet	
  captures	
  from	
  switches	
  are	
  streamed	
  into	
  the	
  cluster	
  
      –  New	
  paBern	
  definiLons	
  are	
  loaded	
  into	
  online	
  IPS	
  via	
  NFS	
  



§    Online	
  measurement	
  company	
  
      –  Clickstreams	
  from	
  applicaLon	
  servers	
  are	
  streamed	
  into	
  the	
  cluster	
  



§    SaaS	
  company	
  
      –  ExporLng	
  a	
  database	
  to	
  Hadoop	
  over	
  NFS	
  



§    Ad	
  exchange	
  
      –  Bids	
  and	
  transacLons	
  are	
  streamed	
  into	
  the	
  cluster	
  
©MapR	
  Technologies	
                                     16	
  
Customer	
  Examples:	
  Produc4vity	
  and	
  Opera4ons	
  

§    Retailer	
  
      –  OperaLonal	
  scripts	
  are	
  easier	
  with	
  NFS	
  than	
  HDFS	
  +	
  MapReduce	
  
            •    chmod/chown,	
  file	
  system	
  searches/greps,	
  perl,	
  awk,	
  tab-­‐complete	
  
      –  Consolidate	
  object	
  store	
  with	
  analyLcs	
  


§    Credit	
  card	
  company	
  
      –  User	
  and	
  project	
  home	
  directories	
  on	
  Linux	
  gateways	
  
            •    Local	
  files,	
  scripts,	
  source	
  code,	
  …	
  
            •    Administrators	
  manage	
  quotas,	
  snapshots/backups,	
  …	
  


§    Large	
  Internet	
  company	
  recommendaLon	
  system	
  
      –  Web	
  server	
  serve	
  MapReduce	
  results	
  	
  (item	
  relaLonships)	
  directly	
  from	
  cluster	
  


§    Email	
  markeLng	
  company	
  
      –  Object	
  store	
  with	
  HBase	
  and	
  NFS	
  


©MapR	
  Technologies	
                                           17	
  
ODBC	
  


©MapR	
  Technologies	
     18	
  
ODBC	
  

§    ODBC	
  –	
  Open	
  DataBase	
  ConnecLvity	
  
      –  Open	
  standard	
  API	
  for	
  accessing	
  a	
  SQL-­‐based	
  backend	
  
      –  Developed	
  by	
  Microsoq	
  and	
  Simba	
  Technologies	
  in	
  1992	
  



§    Flagship	
  API	
  for	
  SQL-­‐based	
  BI	
  and	
  reporLng	
  
      –    Excel,	
  Tableau,	
  MicroStrategy,	
  Crystal	
  Reports,	
  …	
  


§    Advanced	
  ODBC	
  drivers	
  use	
  the	
  latest	
  3.52	
  specificaLon	
  




©MapR	
  Technologies	
                                         19	
  
MapR	
  ODBC	
  Driver	
  

§    MapR	
  provides	
  a	
  Hive	
  ODBC	
  3.52	
  driver	
  
      –  Developed	
  in	
  partnership	
  with	
  ODBC	
  inventor	
  Simba	
  Technologies	
  
      –  Compliant	
  with	
  latest	
  ODBC	
  3.52	
  specificaLon	
  
            •    32-­‐	
  and	
  64-­‐bit	
  plavorm	
  support	
  
            •    Windows	
  and	
  Linux	
  


§    Enables	
  direct	
  SQL	
  access	
  to	
  MapR-­‐stored	
  data	
  by	
  translaLng	
  SQL	
  to	
  
      HiveQL	
  

§    SQLizer	
  enables	
  seamless	
  connecLvity	
  
      –  Provides	
  ANSI	
  SQL-­‐92	
  front-­‐end	
  
      –  Targeted	
  for	
  exisLng	
  apps	
  that	
  generate	
  standard	
  SQL	
  queries	
  
      –  Transforms	
  SQL	
  query	
  into	
  HiveQL	
  query	
  



©MapR	
  Technologies	
                                               20	
  
Example:	
  Tableau	
  




©MapR	
  Technologies	
     21	
  
Example:	
  Open	
  source	
  query	
  builder	
  (Kaimon)	
  




©MapR	
  Technologies	
         22	
  
Example:	
  MicrosoW	
  Excel	
  




©MapR	
  Technologies	
     23	
  
In	
  Summary	
  

§    Open	
  standards	
  are	
  important	
  
§    SupporLng	
  exisLng	
  applicaLons	
  and	
  tools	
  that	
  support	
  those	
  
      standards	
  is	
  valuable	
  
      –  Preserves	
  investment	
  in	
  tools	
  
      –  Preserves	
  investment	
  in	
  custom	
  applicaLons	
  that	
  proceeded	
  Hadoop	
  
      –  Leverages	
  skills	
  you	
  already	
  have	
  




©MapR	
  Technologies	
                                      24	
  
Join	
  MapR	
  

§    Join	
  the	
  fastest	
  growing	
  Hadoop	
  company	
  


§    Open	
  posiLons	
  in	
  every	
  discipline	
  
      –  Engineers	
  
      –  SoluLon	
  Architects	
  
      –  Product	
  Management	
  



§    Email	
  jobs@mapr.com	
  




©MapR	
  Technologies	
                             25	
  
Time	
  for	
  Ques4ons	
  

§    Download	
  slides	
  or	
  send	
  me	
  an	
  email	
  
      –  hBp://www.mapr.com/company/events/speaking/dc-­‐hug-­‐9-­‐18-­‐12	
  	
  



§    Download	
  MapR	
  to	
  learn	
  more	
  
      –  www.mapr.com/download	
  




©MapR	
  Technologies	
                               26	
  

Contenu connexe

Tendances

Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Ted Dunning
 
July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools wit...
July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools wit...July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools wit...
July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools wit...Yahoo Developer Network
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Cloudera, Inc.
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down InternetMapR Technologies
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batchboorad
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuningVitthal Gogate
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaDataWorks Summit
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Ted Dunning
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APImcsrivas
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoopmcsrivas
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce introGeoff Hendrey
 

Tendances (20)

Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012
 
July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools wit...
July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools wit...July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools wit...
July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools wit...
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in Alibaba
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
10c introduction
10c introduction10c introduction
10c introduction
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Philly DB MapR Overview
Philly DB MapR OverviewPhilly DB MapR Overview
Philly DB MapR Overview
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 

Similaire à Using Standard File-Based Applications and SQL Tools with Hadoop

Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFSKavyaGo
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellKhalid Imran
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemSteve Loughran
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoopveeracynixit
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoopveeracynixit
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to HadoopAnandMHadoop
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Basic Hadoop Architecture V1 vs V2
Basic  Hadoop Architecture  V1 vs V2Basic  Hadoop Architecture  V1 vs V2
Basic Hadoop Architecture V1 vs V2VIVEKVANAVAN
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheSandeepTaksande
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredNETWAYS
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopSpagoWorld
 

Similaire à Using Standard File-Based Applications and SQL Tools with Hadoop (20)

Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Ess1000 glossary
Ess1000 glossaryEss1000 glossary
Ess1000 glossary
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Basic Hadoop Architecture V1 vs V2
Basic  Hadoop Architecture  V1 vs V2Basic  Hadoop Architecture  V1 vs V2
Basic Hadoop Architecture V1 vs V2
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
 
HBase with MapR
HBase with MapRHBase with MapR
HBase with MapR
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from Hadoop
 

Plus de MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Plus de MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Dernier

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Dernier (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Using Standard File-Based Applications and SQL Tools with Hadoop

  • 1. Using  Standard  File-­‐Based   Applica4ons  and  SQL-­‐Based   Tools  with  Hadoop   ©MapR  Technologies   1  
  • 2. Who  am  I?   hBp://www.mapr.com/company/events/ speaking/dc-­‐hug-­‐9-­‐18-­‐12   §  Keys  Botzum   §  kbotzum@maprtech.com   §  Senior  Principal  Technologist,  MapR  Technologies   ©MapR  Technologies   2  
  • 3. The  MapR  Distribu4on  for  Apache  Hadoop   §  The  open,  enterprise-­‐grade  distribuLon  for  Apache  Hadoop   –  Open  source  components   •  Hive,  Pig,  Cascading,  HBase,  ZooKeeper,  Oozie,  Flume,  Sqoop,  Whirr,  …   –  Enhancements  to  make  Hadoop  more  open  and  enterprise-­‐grade   §  Growing  fast  and  a  recognized  leader   ©MapR  Technologies   3  
  • 4. MapR  in  the  Cloud     §  Available  as  a  service  with  Amazon  ElasLc  MapReduce  (EMR)   –  hBp://aws.amazon.com/elasLcmapreduce/mapr     §  Available  as  a  service  with  Google  Compute  Engine     ©MapR  Technologies   4  
  • 5. MapR   Make  Hadoop   Make  Hadoop   more  open   enterprise-­‐grade   •  High  Availability   •  Scalability   •  Management  tools  –  Web,  CLI,  REST   This  presentaLon   •  Data  ProtecLon  –  snapshots  &  mirroring   •  Performance   ©MapR  Technologies   5  
  • 6. Not  All  Applica4ons  Use  the  Hadoop  APIs   ApplicaLons  and   libraries  that  use  files   and/or  SQL   •  These  are  not  legacy   30  years   applicaLons,  they  are   100,000s  applicaLons   valuable  applicaLons   10,000s  libraries   10s  programming  languages     ApplicaLons  and   libraries  that  use  the   Hadoop  APIs     ©MapR  Technologies   6  
  • 7. Hadoop  Needs  Industry-­‐Standard  Interfaces   Hadoop   •  MapReduce  and  HBase  applicaLons   API   •  Mostly  custom-­‐built   •  File-­‐based  applicaLons   NFS   •  Supported  by  most  operaLng  systems   •  SQL-­‐based  tools   ODBC   •  Supported  by  most  BI  applicaLons  and   query  builders   ©MapR  Technologies   7  
  • 9. Your  Data  is  Important   §  HDFS-­‐based  Hadoop  distribuLons  do  not  (cannot)   properly  support  NFS   §  Your  data  is  important,  it  drives  your  business  –  make   sure  you  can  access  it   –  Why  store  your  data  in  a  system  which  cannot  be  accessed   by  95%  of  the  world’s  applicaLons  and  libraries?   §  Access  to  HDFS  source  code  !=  access  to  your  data   ©MapR  Technologies   9  
  • 10. The  NFS  Protocol   §  RFC  1813   WRITE3res  NFSPROC3_WRITE(WRITE3args)  =  7;     struct  WRITE3args  {          nfs_fh3          file;   §  Very  simple  protocol          offset3          offset;          count3            count;          stable_how    stable;   §  Random  reads/writes          opaque            data<>;   –  Read  count  bytes  from   };   offset  offset  of  file  file     READ3res  NFSPROC3_READ(READ3args)  =  6;   –  Write  buffer  data  to       offset  offset  of  a  file  file   struct  READ3args  {          nfs_fh3    file;          offset3    offset;   §  HDFS  does  not  support          count3      count;   random  writes  so  it   };   cannot  support  NFS     ©MapR  Technologies   10  
  • 11. S3   o.a.h.fs.s3naLve.NaLveS3FileSystem   ©MapR  Technologies   HDFS   o.a.h.hdfs.DistributedFileSystem   Local  File  System   Storage  Layers   o.a.h.fs.LocalFileSystem   MapReduce   FTP   o.a.h.fs.qp.FTPFileSystem   11   MapR  storage  layer   o.a.h.fs.FileSystem  Interface   com.mapr.fs.MapRFileSystem   Hadoop   Hadoop  Was  Designed  to  Support  Mul4ple   NFS  interface   FileSystem  API  
  • 12. One  NFS  Gateway   What  about  scalability  and  high  availability?   ©MapR  Technologies   12  
  • 13. Mul4ple  NFS  Gateways   ©MapR  Technologies   13  
  • 14. Mul4ple  NFS  Gateways  with  Load  Balancing   ©MapR  Technologies   14  
  • 15. Mul4ple  NFS  Gateways  with  NFS  HA  (VIPs)   ©MapR  Technologies   15  
  • 16. Customer  Examples:  Import/Export  Data   §  Network  security  vendor   –  Network  packet  captures  from  switches  are  streamed  into  the  cluster   –  New  paBern  definiLons  are  loaded  into  online  IPS  via  NFS   §  Online  measurement  company   –  Clickstreams  from  applicaLon  servers  are  streamed  into  the  cluster   §  SaaS  company   –  ExporLng  a  database  to  Hadoop  over  NFS   §  Ad  exchange   –  Bids  and  transacLons  are  streamed  into  the  cluster   ©MapR  Technologies   16  
  • 17. Customer  Examples:  Produc4vity  and  Opera4ons   §  Retailer   –  OperaLonal  scripts  are  easier  with  NFS  than  HDFS  +  MapReduce   •  chmod/chown,  file  system  searches/greps,  perl,  awk,  tab-­‐complete   –  Consolidate  object  store  with  analyLcs   §  Credit  card  company   –  User  and  project  home  directories  on  Linux  gateways   •  Local  files,  scripts,  source  code,  …   •  Administrators  manage  quotas,  snapshots/backups,  …   §  Large  Internet  company  recommendaLon  system   –  Web  server  serve  MapReduce  results    (item  relaLonships)  directly  from  cluster   §  Email  markeLng  company   –  Object  store  with  HBase  and  NFS   ©MapR  Technologies   17  
  • 19. ODBC   §  ODBC  –  Open  DataBase  ConnecLvity   –  Open  standard  API  for  accessing  a  SQL-­‐based  backend   –  Developed  by  Microsoq  and  Simba  Technologies  in  1992   §  Flagship  API  for  SQL-­‐based  BI  and  reporLng   –  Excel,  Tableau,  MicroStrategy,  Crystal  Reports,  …   §  Advanced  ODBC  drivers  use  the  latest  3.52  specificaLon   ©MapR  Technologies   19  
  • 20. MapR  ODBC  Driver   §  MapR  provides  a  Hive  ODBC  3.52  driver   –  Developed  in  partnership  with  ODBC  inventor  Simba  Technologies   –  Compliant  with  latest  ODBC  3.52  specificaLon   •  32-­‐  and  64-­‐bit  plavorm  support   •  Windows  and  Linux   §  Enables  direct  SQL  access  to  MapR-­‐stored  data  by  translaLng  SQL  to   HiveQL   §  SQLizer  enables  seamless  connecLvity   –  Provides  ANSI  SQL-­‐92  front-­‐end   –  Targeted  for  exisLng  apps  that  generate  standard  SQL  queries   –  Transforms  SQL  query  into  HiveQL  query   ©MapR  Technologies   20  
  • 21. Example:  Tableau   ©MapR  Technologies   21  
  • 22. Example:  Open  source  query  builder  (Kaimon)   ©MapR  Technologies   22  
  • 23. Example:  MicrosoW  Excel   ©MapR  Technologies   23  
  • 24. In  Summary   §  Open  standards  are  important   §  SupporLng  exisLng  applicaLons  and  tools  that  support  those   standards  is  valuable   –  Preserves  investment  in  tools   –  Preserves  investment  in  custom  applicaLons  that  proceeded  Hadoop   –  Leverages  skills  you  already  have   ©MapR  Technologies   24  
  • 25. Join  MapR   §  Join  the  fastest  growing  Hadoop  company   §  Open  posiLons  in  every  discipline   –  Engineers   –  SoluLon  Architects   –  Product  Management   §  Email  jobs@mapr.com   ©MapR  Technologies   25  
  • 26. Time  for  Ques4ons   §  Download  slides  or  send  me  an  email   –  hBp://www.mapr.com/company/events/speaking/dc-­‐hug-­‐9-­‐18-­‐12     §  Download  MapR  to  learn  more   –  www.mapr.com/download   ©MapR  Technologies   26