SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
1	
1	
Evaluation  of  Cloudera  impala  1.1
Aug  7,  2013
CELLANT  Corp.  R&D  Strategy  Division
Yukinori  SUDA
@sudabon
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v  Sentry  support:
l  Fine-‐‑‒grained  authorization
l  Role-‐‑‒based  authorization
v  Support  for  views
v  Performance  improvements
l  Parquet  columnar  performance
l  More  efficient  metadata  refresh  for  larger  installations
v  Additional  SQL
l  SQL-‐‑‒89  joins  (in  addition  to  existing  SQL-‐‑‒92)
l  LOAD  function
l  REFRESH  command  for  JDBC/ODBC
v  Improved  Hbase  support:
l  Binary  types
l  Caching  configuration
v  Fixed  many  bugs
Cloudera  Impala  1.1  was  released  !!
2
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v Hive  ⇒  Impala
l On  Impala  shell,  can  read  data  in  “VIEW”  that  was  
created  via  Hive  command  ?
v Impala  ⇒  Hive
l On  Hive  shell,  can  read  data  in  “VIEW”  that  was  
created  via  Impala  command  ?
v Result
Two  “VIEW”s  have  compatibility
Check  compatibility  of  “VIEW”
3
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Check  performance  (Hive  on  Cluster1)
4
0 50 100 150 200 250
No  Comp.
Gzip
Snappy
Gzip
Snappy
TextFileSequenceFileRCFile
222.039
244.67
239.182
228.801
230.327
Avg.  Job  Latency  [sec]
This result will be invalid as performance evaluation cause some data may be read remotely.
See the slide of “Check performance (Hive on Cluster2)”.
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Check  performance  (Impala  on  Cluster1)
5
0 50 100 150 200 250
No  Comp.
Gzip
Snappy
Gzip
Snappy
Snappy
Text
File
Sequence
FileRCFile
Parquet
File
23.518
32.155
28.617
20.774
12.654
13.146
Avg.  Job  Latency  [sec]
This result will be invalid as performance evaluation
cause some data may be read remotely.
See the slide of “Check performance (Impala on Cluster2)”.
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Check  performance  (Hive  on  Cluster2)
6
0 50 100 150 200 250 300
No  Comp.
Gzip
Snappy
Gzip
Snappy
TextFileSequenceFileRCFile
272.176
249.531
245.009
230.034
216.802
Avg.  Job  Latency  [sec]
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Check  performance  (Impala  on  Cluster2)
7
0 50 100 150 200 250 300
No  Comp.
Gzip
Snappy
Gzip
Snappy
Snappy
Text
File
Sequence
FileRCFile
Parquet
File
32.528
28.73
21.173
24.794
14.308
19.814
Avg.  Job  Latency  [sec]
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v IMPALA-‐‑‒357
l Insert  into  Parquet  exceed  mem-‐‑‒limit
v Problem
l Even  if  set  mem_̲limit  setting,  when  create  ParquetFile  
table  with  partitions,  consumed  memory  isnʼ’t  limited.  
l At  last,  Impalad  crashes  due  to  memory  shortage
v Result
CREATE  command  failed  due  to  memory  limit
Check  fixed  bug
8
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v Thanks  to  dev.  team,  Impala  is  also  going  
from  “Good  to  Great”
v Both  “VIEW”  and  “Parquet”  are  already  ready
v Performance
v RCFile+Snappy  is  the  fastest  on  both  Cluster1  and  
Cluster2
v If  use  larger  size  table,  Parquet+Snappy  may  be  the  
fastest
v Hope  for  future  extension
l Support  Structure  Types
l Support  UDF/UDTF,  etc
Summary
9
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
10
Appendix.  Benchmark  Details
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Our  System  Environment(Cluster1)
11
v  Install  using  Cloudera  Manager  Free  Edition  4.6.0
Master Slave
14  Servers
All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch
Active
NameNode
DataNode
TaskTracker
Impalad
Stand-‐‑‒by
NameNode
JobTracker
statestored
3  Servers
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
DataNode
DataNode
DataNode
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Our  System  Environment(Cluster2)
12
v  Install  using  Cloudera  Manager  Free  Edition  4.6.0
Master Slave
10  Servers
All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch
Active
NameNode
DataNode
TaskTracker
Impalad
Stand-‐‑‒by
NameNode
JobTracker
statestored
3  Servers
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
DataNode
DataNode
DataNode
Decommissioned
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v CPU
l Intel  Core  2  Duo  2.13  GHz  with  Hyper  Threading
v Memory
l 8GB  :  Namenodes  only
l 4GB  :  Others
v Disk
l 7,200  rpm  SATA  mechanical  Hard  Disk  Drive  *  1
v OS
l Cent  OS  6.3
Our  Server  Specification
13
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v  Use  CDH4.3.0  +  Impala  1.1
v  Use  hivebench  in  open-‐‑‒sourced  benchmark  tool  “HiBench”
l  https://github.com/hibench
v  Modified  datasets  to  1/10  scale
l  Default  configuration  generates  table  with  1  billion  rows
v  Modified  query  sentence
l  Deleted  “INSERT  INTO  TABLE  …”  to  evaluate  read-‐‑‒only  performance
v  Combines  a  few  storage  format  with  a  few  compression  method
l  TextFile,  SequenceFile,  RCFile,  ParquestFile
l  No  compression,  Gzip,  Snappy
v  Comparison  with  job  query  latency
v  Average  job  latency  over  5  measurements
v  Benchmark  on  both  Cluster1  and  Cluster2
Benchmark
14
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
•  Uservisits  table
–  100  million  rows
–  16,895  MB  as  TextFile
–  Table  Definitions
•  sourceIP   string
•  destURL   string
•  visitDate   string
•  adRevenue   double
•  userAgent   string
•  countryCode   string
•  languageCode  string
•  searchWord   string
•  duration   int
•  Rankings  table
–  12  million  rows
–  744  MB  as  TextFile
–  Table  Definitions
•  pageURL string
•  pageRank int
•  avgDuration int
Modified  Datasets
15
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
SELECT
  sourceIP,
  sum(adRevenue)  as  totalRevenue,
  avg(pageRank)  
FROM
  rankings_̲t  R
JOIN  [BROADCAST]  (
  SELECT
    sourceIP,
    destURL,
    adRevenue
  FROM
    uservisits_̲t  UV
  WHERE
    (datediff(UV.visitDate,  '1999-‐‑‒01-‐‑‒01')>=0
    AND
    datediff(UV.visitDate,  '2000-‐‑‒01-‐‑‒01')<=0)
  )  NUV
ON
  (R.pageURL  =  NUV.destURL)
group  by  sourceIP
order  by  totalRevenue  DESC
limit  1;
Modified  Query
16
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
17
Thanks!
I  want  to  use  TPC  in  next  evaluation…

Contenu connexe

Tendances

Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in CloudInMobi Technology
 
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro YamadaPGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro YamadaEqunix Business Solutions
 
RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?Kristofferson A
 
Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftContinuent
 
OOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goOOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goKristofferson A
 
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC DirectivesFortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC DirectivesJeff Larkin
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...Equnix Business Solutions
 
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016DataStax
 
NVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPUNVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPUCan Ozdoruk
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemLinaro
 
HCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality CheckerHCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality CheckerLinaro
 
The Database Sizing Workflow
The Database Sizing WorkflowThe Database Sizing Workflow
The Database Sizing WorkflowKristofferson A
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedEqunix Business Solutions
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB ProjectRakuten Group, Inc.
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...Equnix Business Solutions
 
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...Kristofferson A
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsContinuent
 
Whitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryWhitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryKristofferson A
 

Tendances (20)

Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in Cloud
 
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro YamadaPGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
 
RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?
 
Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
 
OOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goOOW 2013: Where did my CPU go
OOW 2013: Where did my CPU go
 
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC DirectivesFortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
 
Case Studies on PostgreSQL
Case Studies on PostgreSQLCase Studies on PostgreSQL
Case Studies on PostgreSQL
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
 
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
 
NVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPUNVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPU
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
 
HCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality CheckerHCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality Checker
 
The Database Sizing Workflow
The Database Sizing WorkflowThe Database Sizing Workflow
The Database Sizing Workflow
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
 
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analytics
 
Whitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryWhitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success Story
 

En vedette

Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impalaNAVER D2
 
ImpalaToGo introduction
ImpalaToGo introductionImpalaToGo introduction
ImpalaToGo introductionDavid Groozman
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Gregg Barrett
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Yukinori Suda
 

En vedette (6)

Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
 
ImpalaToGo introduction
ImpalaToGo introductionImpalaToGo introduction
ImpalaToGo introduction
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
 

Similaire à Evaluation of cloudera impala 1.1

Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HiveYukinori Suda
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0ScyllaDB
 
OpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackOpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackSean Dague
 
Switch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrSwitch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrCumulus Networks
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingStanislav Osipov
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYWangda Tan
 
Production Grade Kubernetes Applications
Production Grade Kubernetes ApplicationsProduction Grade Kubernetes Applications
Production Grade Kubernetes ApplicationsNarayanan Krishnamurthy
 
The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014Puppet
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataDataWorks Summit
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBoxlzap
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and HadoopDataWorks Summit
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Chris Fregly
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...Chris Fregly
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practiceDocker, Inc.
 
Running Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesRunning Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesYugabyte
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300Timothy Spann
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Vietnam Open Infrastructure User Group
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownDataWorks Summit
 

Similaire à Evaluation of cloudera impala 1.1 (20)

Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
OpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackOpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with Devstack
 
Switch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrSwitch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie Carr
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scaling
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 
Production Grade Kubernetes Applications
Production Grade Kubernetes ApplicationsProduction Grade Kubernetes Applications
Production Grade Kubernetes Applications
 
The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big Data
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBox
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
Running Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesRunning Stateful Apps on Kubernetes
Running Stateful Apps on Kubernetes
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdown
 

Plus de Yukinori Suda

Hadoop operation chaper 4
Hadoop operation chaper 4Hadoop operation chaper 4
Hadoop operation chaper 4Yukinori Suda
 
Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Yukinori Suda
 
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービスHadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービスYukinori Suda
 
自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜Yukinori Suda
 
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)Yukinori Suda
 
HiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取りHiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取りYukinori Suda
 
Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)Yukinori Suda
 

Plus de Yukinori Suda (7)

Hadoop operation chaper 4
Hadoop operation chaper 4Hadoop operation chaper 4
Hadoop operation chaper 4
 
Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話
 
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービスHadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
 
自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜
 
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
 
HiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取りHiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取り
 
Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)
 

Dernier

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Dernier (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Evaluation of cloudera impala 1.1

  • 1. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 1 1 Evaluation  of  Cloudera  impala  1.1 Aug  7,  2013 CELLANT  Corp.  R&D  Strategy  Division Yukinori  SUDA @sudabon
  • 2. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v  Sentry  support: l  Fine-‐‑‒grained  authorization l  Role-‐‑‒based  authorization v  Support  for  views v  Performance  improvements l  Parquet  columnar  performance l  More  efficient  metadata  refresh  for  larger  installations v  Additional  SQL l  SQL-‐‑‒89  joins  (in  addition  to  existing  SQL-‐‑‒92) l  LOAD  function l  REFRESH  command  for  JDBC/ODBC v  Improved  Hbase  support: l  Binary  types l  Caching  configuration v  Fixed  many  bugs Cloudera  Impala  1.1  was  released  !! 2
  • 3. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v Hive  ⇒  Impala l On  Impala  shell,  can  read  data  in  “VIEW”  that  was   created  via  Hive  command  ? v Impala  ⇒  Hive l On  Hive  shell,  can  read  data  in  “VIEW”  that  was   created  via  Impala  command  ? v Result Two  “VIEW”s  have  compatibility Check  compatibility  of  “VIEW” 3
  • 4. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Check  performance  (Hive  on  Cluster1) 4 0 50 100 150 200 250 No  Comp. Gzip Snappy Gzip Snappy TextFileSequenceFileRCFile 222.039 244.67 239.182 228.801 230.327 Avg.  Job  Latency  [sec] This result will be invalid as performance evaluation cause some data may be read remotely. See the slide of “Check performance (Hive on Cluster2)”.
  • 5. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Check  performance  (Impala  on  Cluster1) 5 0 50 100 150 200 250 No  Comp. Gzip Snappy Gzip Snappy Snappy Text File Sequence FileRCFile Parquet File 23.518 32.155 28.617 20.774 12.654 13.146 Avg.  Job  Latency  [sec] This result will be invalid as performance evaluation cause some data may be read remotely. See the slide of “Check performance (Impala on Cluster2)”.
  • 6. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Check  performance  (Hive  on  Cluster2) 6 0 50 100 150 200 250 300 No  Comp. Gzip Snappy Gzip Snappy TextFileSequenceFileRCFile 272.176 249.531 245.009 230.034 216.802 Avg.  Job  Latency  [sec]
  • 7. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Check  performance  (Impala  on  Cluster2) 7 0 50 100 150 200 250 300 No  Comp. Gzip Snappy Gzip Snappy Snappy Text File Sequence FileRCFile Parquet File 32.528 28.73 21.173 24.794 14.308 19.814 Avg.  Job  Latency  [sec]
  • 8. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v IMPALA-‐‑‒357 l Insert  into  Parquet  exceed  mem-‐‑‒limit v Problem l Even  if  set  mem_̲limit  setting,  when  create  ParquetFile   table  with  partitions,  consumed  memory  isnʼ’t  limited.   l At  last,  Impalad  crashes  due  to  memory  shortage v Result CREATE  command  failed  due  to  memory  limit Check  fixed  bug 8
  • 9. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v Thanks  to  dev.  team,  Impala  is  also  going   from  “Good  to  Great” v Both  “VIEW”  and  “Parquet”  are  already  ready v Performance v RCFile+Snappy  is  the  fastest  on  both  Cluster1  and   Cluster2 v If  use  larger  size  table,  Parquet+Snappy  may  be  the   fastest v Hope  for  future  extension l Support  Structure  Types l Support  UDF/UDTF,  etc Summary 9
  • 10. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 10 Appendix.  Benchmark  Details
  • 11. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Our  System  Environment(Cluster1) 11 v  Install  using  Cloudera  Manager  Free  Edition  4.6.0 Master Slave 14  Servers All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch Active NameNode DataNode TaskTracker Impalad Stand-‐‑‒by NameNode JobTracker statestored 3  Servers DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode DataNode DataNode DataNode
  • 12. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Our  System  Environment(Cluster2) 12 v  Install  using  Cloudera  Manager  Free  Edition  4.6.0 Master Slave 10  Servers All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch Active NameNode DataNode TaskTracker Impalad Stand-‐‑‒by NameNode JobTracker statestored 3  Servers DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode DataNode DataNode DataNode Decommissioned
  • 13. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v CPU l Intel  Core  2  Duo  2.13  GHz  with  Hyper  Threading v Memory l 8GB  :  Namenodes  only l 4GB  :  Others v Disk l 7,200  rpm  SATA  mechanical  Hard  Disk  Drive  *  1 v OS l Cent  OS  6.3 Our  Server  Specification 13
  • 14. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v  Use  CDH4.3.0  +  Impala  1.1 v  Use  hivebench  in  open-‐‑‒sourced  benchmark  tool  “HiBench” l  https://github.com/hibench v  Modified  datasets  to  1/10  scale l  Default  configuration  generates  table  with  1  billion  rows v  Modified  query  sentence l  Deleted  “INSERT  INTO  TABLE  …”  to  evaluate  read-‐‑‒only  performance v  Combines  a  few  storage  format  with  a  few  compression  method l  TextFile,  SequenceFile,  RCFile,  ParquestFile l  No  compression,  Gzip,  Snappy v  Comparison  with  job  query  latency v  Average  job  latency  over  5  measurements v  Benchmark  on  both  Cluster1  and  Cluster2 Benchmark 14
  • 15. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / •  Uservisits  table –  100  million  rows –  16,895  MB  as  TextFile –  Table  Definitions •  sourceIP  string •  destURL  string •  visitDate  string •  adRevenue  double •  userAgent  string •  countryCode  string •  languageCode  string •  searchWord  string •  duration  int •  Rankings  table –  12  million  rows –  744  MB  as  TextFile –  Table  Definitions •  pageURL string •  pageRank int •  avgDuration int Modified  Datasets 15
  • 16. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / SELECT   sourceIP,   sum(adRevenue)  as  totalRevenue,   avg(pageRank)   FROM   rankings_̲t  R JOIN  [BROADCAST]  (   SELECT     sourceIP,     destURL,     adRevenue   FROM     uservisits_̲t  UV   WHERE     (datediff(UV.visitDate,  '1999-‐‑‒01-‐‑‒01')>=0     AND     datediff(UV.visitDate,  '2000-‐‑‒01-‐‑‒01')<=0)   )  NUV ON   (R.pageURL  =  NUV.destURL) group  by  sourceIP order  by  totalRevenue  DESC limit  1; Modified  Query 16
  • 17. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 17 Thanks! I  want  to  use  TPC  in  next  evaluation…