SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
1	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Impala	
  Resource	
  Management:	
  
A	
  Brief	
  Overview	
  
MaAhew	
  Jacobs	
  |	
  @maAjacobs	
  
	
  
November	
  2015	
  
Relevant	
  through	
  Impala	
  2.2/CDH5.4	
  
2	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Impala	
  Resource	
  Management:	
  Overview	
  
•  Problem:	
  how	
  to	
  best	
  uIlize	
  cluster	
  resources	
  
	
  
State	
  of	
  the	
  world	
  as	
  of	
  Impala	
  2.2/CDH5.4	
  
•  Within	
  Impala	
  
• READY	
  FOR	
  USE:	
  	
  Built-­‐in	
  Admission	
  Control	
  (introduced	
  in	
  Impala	
  1.3/CDH	
  5.0)	
  
•  Between	
  Impala	
  and	
  the	
  rest	
  of	
  the	
  world	
  
• READY	
  FOR	
  USE:	
  “StaIc	
  ParIIoning”	
  from	
  Cloudera	
  Manager	
  
• NOT	
  READY:	
  IntegraIon	
  with	
  YARN	
  
•  Experimental	
  integraIon	
  shipped	
  in	
  Impala	
  1.3/CDH	
  5.0	
  
•  Some	
  known	
  issues	
  exist,	
  do	
  not	
  use	
  it	
  today!	
  More	
  on	
  this	
  later…	
  
•  We’re	
  acIvely	
  working	
  on	
  this,	
  stay	
  tuned!	
  
3	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Talk	
  Overview	
  
This	
  is	
  a	
  very	
  brief	
  overview!	
  Many	
  details	
  we	
  can’t	
  cover	
  in	
  20min	
  L	
  
	
  
•  How	
  to	
  be	
  successful	
  today	
  (including	
  with	
  Impala	
  2.3/CDH5.5)	
  
•  Overview	
  of	
  Impala	
  on	
  YARN	
  
•  Architecture	
  
•  Why	
  you	
  can’t	
  use	
  it	
  yet	
  
•  How	
  it	
  might	
  look	
  when	
  you	
  can	
  
4	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
“Resource	
  Management”	
  Today	
  
•  Use	
  one	
  or	
  both	
  of:	
  
• StaIc	
  ParIIoning	
  with	
  Cloudera	
  Manager	
  (also	
  called	
  “StaIc	
  Resource	
  Pools”)	
  
• Impala’s	
  built	
  in	
  Admission	
  Control	
  
•  StaIc	
  ParIIoning:	
  dedicate	
  resources	
  for	
  Impala,	
  HBase,	
  YARN,	
  etc.	
  
• Easy	
  to	
  use	
  and	
  works	
  well.	
  Set	
  up	
  by	
  Cloudera	
  Manager,	
  uses	
  cgroups	
  
• E.g.	
  Impala	
  gets	
  100GB/30%	
  CPU,	
  HBase	
  gets	
  50GB/20%	
  CPU,	
  etc.	
  
•  Admission	
  Control:	
  throAle	
  Impala	
  queries	
  
• Set	
  a	
  limit	
  on	
  the	
  max	
  #	
  queries	
  or	
  max	
  memory	
  used	
  by	
  those	
  queries	
  
• E.g.	
  queue	
  queries	
  once	
  more	
  than	
  20	
  queries	
  are	
  running	
  concurrently,	
  or	
  
queue	
  once	
  more	
  than	
  100GB	
  is	
  used	
  
5	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
When	
  to	
  Use	
  AC?	
  StaIc	
  ParIIoning?	
  
With	
  Admission	
  Control	
   Without	
  Admission	
  Control	
  
With	
  Sta2c	
  
Par22oning	
  
•  Using	
  Impala	
  with	
  other	
  systems	
  (e.g.	
  
Hive,	
  Spark)	
  and	
  need	
  to	
  guarantee	
  
each	
  get	
  resources	
  
•  Heavy	
  Impala	
  workload,	
  need	
  to	
  make	
  
sure	
  queries	
  aren’t	
  stepping	
  on	
  each	
  
other	
  
•  Using	
  Impala	
  with	
  other	
  systems	
  and	
  
need	
  to	
  guarantee	
  each	
  get	
  resources	
  
•  Light	
  to	
  moderate	
  Impala	
  workload,	
  not	
  
using	
  all	
  available	
  resources	
  yet	
  
Without	
  Sta2c	
  
Par22oning	
  
•  Impala	
  only	
  cluster,	
  or	
  other	
  systems	
  
have	
  very	
  light,	
  non-­‐compeIng	
  
workloads	
  
•  Heavy	
  Impala	
  workload,	
  need	
  to	
  make	
  
sure	
  queries	
  aren’t	
  stepping	
  on	
  each	
  
other	
  
•  Enough	
  cluster	
  resources	
  are	
  available	
  
for	
  all	
  workloads	
  to	
  consume	
  as	
  much	
  as	
  
necessary	
  
6	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
(Aside:	
  A	
  Plethora	
  of	
  Mem	
  Limits)	
  
•  Process	
  (impalad)	
  memory	
  limit	
  
•  Max	
  memory	
  the	
  process	
  can	
  use	
  across	
  all	
  queries.	
  When	
  a	
  query	
  consumes	
  memory	
  such	
  that	
  the	
  process	
  
hits	
  this	
  limit	
  the	
  query	
  is	
  killed	
  
•  Set	
  with	
  the	
  “-­‐-­‐mem_limit”	
  impalad	
  command-­‐line	
  argument,	
  or	
  “Impala	
  Daemon	
  Memory	
  Limit”	
  in	
  CM.	
  
The	
  value	
  is	
  specified	
  in	
  terms	
  of	
  single-­‐impalad	
  memory.	
  
•  Pool	
  (admission	
  control)	
  memory	
  limit	
  
•  Max	
  memory	
  the	
  queries	
  in	
  a	
  pool/queue	
  can	
  use.	
  The	
  value	
  is	
  used	
  only	
  to	
  admit	
  queries,	
  not	
  enforced	
  once	
  
queries	
  are	
  admiAed.	
  The	
  value	
  is	
  specified	
  as	
  the	
  cluster-­‐wide	
  limit,	
  i.e.	
  aggregate	
  limit	
  across	
  all	
  impalads.	
  
•  hAp://www.cloudera.com/content/cloudera/en/documentaIon/cloudera-­‐impala/latest/topics/
impala_admission.html	
  
•  Query	
  (query	
  opIon)	
  memory	
  limit	
  
•  Max	
  memory	
  a	
  query	
  can	
  use;	
  if	
  a	
  query	
  uses	
  more	
  than	
  it	
  may	
  have	
  to	
  be	
  killed	
  (if	
  it	
  can’t	
  spill).	
  
•  Set	
  via	
  the	
  “set	
  mem_limit=Xg”	
  query	
  opIon.	
  Can	
  set	
  a	
  default	
  query	
  opIon	
  via	
  impalad	
  command-­‐line	
  
arguments	
  (see	
  the	
  next	
  slide).	
  
•  The	
  value	
  is	
  specified	
  in	
  terms	
  of	
  single-­‐impalad	
  memory,	
  e.g.	
  Xg	
  per	
  node	
  
•  hAp://www.cloudera.com/content/cloudera/en/documentaIon/cloudera-­‐impala/latest/topics/
impala_mem_limit.html	
  
7	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Important!	
  AC	
  with	
  Mem	
  Limits	
  is	
  Tricky	
  
•  Admission	
  based	
  on	
  pool	
  memory	
  limits	
  will	
  use:	
  
• the	
  query	
  memory	
  limit	
  if	
  it	
  is	
  set	
  	
  (set	
  MEM_LIMIT=Xg;)	
  
• Otherwise	
  falls	
  back	
  to	
  an	
  esImate	
  from	
  planning,	
  this	
  is	
  usually	
  wrong!	
  
•  Do	
  not	
  use	
  memory	
  limits	
  unless	
  you	
  set	
  query	
  memory	
  limits	
  
• Consider	
  serng	
  a	
  default	
  value	
  for	
  the	
  ‘mem_limit’	
  query	
  opIon	
  
• Set	
  via	
  the	
  ‘-­‐-­‐default_query_opIons’	
  impalad	
  argument	
  
• E.g.	
  -­‐-­‐default_query_options='mem_limit=5g'	
  
• Can	
  sIll	
  override	
  the	
  default	
  with	
  the	
  ‘set	
  mem_limit=X;’	
  query	
  opIon.	
  
•  Picking	
  a	
  good	
  memory	
  limit	
  is	
  hard,	
  use	
  CM’s	
  charts	
  to	
  help	
  understand	
  your	
  
workload	
  
8	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
“Resource	
  Management”	
  Today,	
  Summary	
  
•  Today:	
  Use	
  Admission	
  Control	
  and	
  StaIc	
  ParIIoning	
  
•  We	
  skipped	
  over	
  a	
  lot	
  of	
  details,	
  see	
  the	
  docs	
  for	
  more	
  informaIon	
  
• Impala	
  Admission	
  Control:	
  
hAp://www.cloudera.com/content/cloudera/en/documentaIon/cloudera-­‐
impala/latest/topics/impala_admission.html	
  
• “StaIc	
  ParIIoning”	
  in	
  Cloudera	
  Manager:	
  
(also	
  called	
  “StaIc	
  Service	
  Pools”)	
  
hAp://www.cloudera.com/content/cloudera/en/documentaIon/core/latest/
topics/cm_mc_service_pools.html	
  
•  Ask	
  us	
  quesIons	
  on	
  impala-­‐user@cloudera.org	
  
9	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Impala	
  on	
  YARN	
  
•  YARN	
  is	
  a	
  “resource	
  negoIator”	
  that	
  helps	
  share	
  cluster	
  resources	
  within	
  Hadoop	
  
•  Works	
  well	
  for	
  MapReduce	
  and	
  similar	
  batch-­‐oriented	
  processing	
  engines	
  	
  
•  Doesn’t	
  work	
  well	
  for	
  services/frameworks	
  that	
  need:	
  
•  Long	
  running	
  processes	
  
•  Gang	
  scheduling	
  
•  Very	
  low-­‐latency	
  scheduling	
  requirements	
  
•  Doesn’t	
  work	
  so	
  well	
  for	
  Impala	
  
• (And	
  also	
  HBase,	
  MPI,	
  Presto,	
  custom	
  apps,	
  etc.)	
  
10	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Llama	
  to	
  the	
  Rescue	
  
•  Llama	
  =	
  Long	
  Lived	
  ApplicaIon	
  MAster	
  
•  On	
  github:	
  hAp://cloudera.github.io/llama/index.html	
  
•  An	
  interface	
  between:	
  
• YARN’s	
  ApplicaIonMaster	
  (AM)	
  model	
  
(batch	
  jobs	
  where	
  tasks	
  are	
  each	
  a	
  process,	
  coordinated	
  by	
  an	
  AM)	
  
• Impala’s	
  low-­‐latency,	
  in-­‐process	
  query	
  model	
  
•  Llama	
  provides:	
  
• Gang-­‐scheduling	
  
• “Container”	
  caching	
  (to	
  reduce	
  resource	
  acquisiIon	
  cost)	
  
11	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
How	
  Llama	
  fits	
  in	
  
1
12	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
How	
  Llama	
  fits	
  in	
  
1
13	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
How	
  Llama	
  fits	
  in	
  
1
14	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
How	
  Llama	
  fits	
  in	
  
1
15	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
How	
  Llama	
  fits	
  in	
  
1
16	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Gang	
  scheduling	
  
•  YARN	
  returns	
  resources	
  in	
  a	
  trickle,	
  as	
  they	
  become	
  available	
  
•  For	
  MR	
  this	
  is	
  perfect,	
  as	
  tasks	
  are	
  mostly	
  independent	
  (and	
  
checkpoint	
  to	
  disk)	
  
•  For	
  low-­‐latency	
  queries,	
  we	
  require	
  all	
  resources	
  to	
  be	
  available	
  at	
  
once	
  so	
  that	
  query	
  tasks	
  can	
  stream	
  results	
  to	
  one	
  another	
  
•  Llama	
  buffers	
  resources	
  between	
  YARN	
  and	
  Impala	
  to	
  make	
  
resource	
  requests	
  appear	
  atomic	
  and	
  indivisible	
  
1
17	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Resource	
  caching	
  
• Every	
  container	
  requires	
  YARN	
  to	
  make	
  an	
  expensive	
  resource	
  
allocaIon	
  decision	
  
• We	
  ask	
  Llama	
  to	
  cache	
  resources	
  between	
  requests	
  
• Containers	
  stay	
  in	
  their	
  queue	
  in	
  Llama,	
  unIl	
  YARN	
  forcefully	
  
reclaims	
  them	
  
1
18	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Impala	
  on	
  YARN:	
  Current	
  Status	
  
•  Experimental	
  integraIon	
  was	
  shipped	
  in	
  Impala	
  1.4	
  /	
  CDH	
  5.0	
  
•  Not	
  ready	
  for	
  use	
  yet!	
  
•  A	
  number	
  of	
  known	
  bugs,	
  see	
  umbrella	
  JIRA	
  IMPALA-­‐2370	
  to	
  track	
  
•  Some	
  (but	
  not	
  all)	
  important	
  fixes	
  in	
  upcoming	
  Impala	
  2.3	
  /	
  CDH	
  5.5	
  release	
  
•  Ongoing	
  scale	
  and	
  performance	
  tesIng	
  work	
  needed	
  to	
  provide	
  guidance	
  
•  In	
  a	
  future	
  release	
  (post-­‐Impala	
  2.3),	
  we	
  will	
  be	
  able	
  to	
  recommend	
  usage	
  for	
  
some	
  workloads,	
  w/	
  guidance	
  
19	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Thank	
  you	
  
@maAjacobs	
  

Contenu connexe

Tendances

Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
Yukinori Suda
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
NAVER D2
 

Tendances (20)

Impala presentation
Impala presentationImpala presentation
Impala presentation
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Impala 2.0 Update #impalajp
Impala 2.0 Update #impalajpImpala 2.0 Update #impalajp
Impala 2.0 Update #impalajp
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | HBase Filtering - Lars George, ClouderaHBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 

Similaire à Impala Resource Management - OUTDATED

Similaire à Impala Resource Management - OUTDATED (20)

Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptx
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
 
YARN
YARNYARN
YARN
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
 
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
 
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkBuilding Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
 
Tuning Java for Big Data
Tuning Java for Big DataTuning Java for Big Data
Tuning Java for Big Data
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design
 
Apache Geode Offheap Storage
Apache Geode Offheap StorageApache Geode Offheap Storage
Apache Geode Offheap Storage
 
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
 
Oracle Database In-Memory Meets Oracle RAC
Oracle Database In-Memory Meets Oracle RACOracle Database In-Memory Meets Oracle RAC
Oracle Database In-Memory Meets Oracle RAC
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
 
Tomcat Optimisation & Performance Tuning
Tomcat Optimisation & Performance TuningTomcat Optimisation & Performance Tuning
Tomcat Optimisation & Performance Tuning
 
Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
 
Eliminating the Pauses in your Java Application
Eliminating the Pauses in your Java ApplicationEliminating the Pauses in your Java Application
Eliminating the Pauses in your Java Application
 
London JBUG April 2015 - Performance Tuning Apps with WildFly Application Server
London JBUG April 2015 - Performance Tuning Apps with WildFly Application ServerLondon JBUG April 2015 - Performance Tuning Apps with WildFly Application Server
London JBUG April 2015 - Performance Tuning Apps with WildFly Application Server
 
Fastest Servlets in the West
Fastest Servlets in the WestFastest Servlets in the West
Fastest Servlets in the West
 
Aem maintenance
Aem maintenanceAem maintenance
Aem maintenance
 
Performance tuning Grails Applications GR8Conf US 2014
Performance tuning Grails Applications GR8Conf US 2014Performance tuning Grails Applications GR8Conf US 2014
Performance tuning Grails Applications GR8Conf US 2014
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014
 

Dernier

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Dernier (20)

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 

Impala Resource Management - OUTDATED

  • 1. 1  ©  Cloudera,  Inc.  All  rights  reserved.   Impala  Resource  Management:   A  Brief  Overview   MaAhew  Jacobs  |  @maAjacobs     November  2015   Relevant  through  Impala  2.2/CDH5.4  
  • 2. 2  ©  Cloudera,  Inc.  All  rights  reserved.   Impala  Resource  Management:  Overview   •  Problem:  how  to  best  uIlize  cluster  resources     State  of  the  world  as  of  Impala  2.2/CDH5.4   •  Within  Impala   • READY  FOR  USE:    Built-­‐in  Admission  Control  (introduced  in  Impala  1.3/CDH  5.0)   •  Between  Impala  and  the  rest  of  the  world   • READY  FOR  USE:  “StaIc  ParIIoning”  from  Cloudera  Manager   • NOT  READY:  IntegraIon  with  YARN   •  Experimental  integraIon  shipped  in  Impala  1.3/CDH  5.0   •  Some  known  issues  exist,  do  not  use  it  today!  More  on  this  later…   •  We’re  acIvely  working  on  this,  stay  tuned!  
  • 3. 3  ©  Cloudera,  Inc.  All  rights  reserved.   Talk  Overview   This  is  a  very  brief  overview!  Many  details  we  can’t  cover  in  20min  L     •  How  to  be  successful  today  (including  with  Impala  2.3/CDH5.5)   •  Overview  of  Impala  on  YARN   •  Architecture   •  Why  you  can’t  use  it  yet   •  How  it  might  look  when  you  can  
  • 4. 4  ©  Cloudera,  Inc.  All  rights  reserved.   “Resource  Management”  Today   •  Use  one  or  both  of:   • StaIc  ParIIoning  with  Cloudera  Manager  (also  called  “StaIc  Resource  Pools”)   • Impala’s  built  in  Admission  Control   •  StaIc  ParIIoning:  dedicate  resources  for  Impala,  HBase,  YARN,  etc.   • Easy  to  use  and  works  well.  Set  up  by  Cloudera  Manager,  uses  cgroups   • E.g.  Impala  gets  100GB/30%  CPU,  HBase  gets  50GB/20%  CPU,  etc.   •  Admission  Control:  throAle  Impala  queries   • Set  a  limit  on  the  max  #  queries  or  max  memory  used  by  those  queries   • E.g.  queue  queries  once  more  than  20  queries  are  running  concurrently,  or   queue  once  more  than  100GB  is  used  
  • 5. 5  ©  Cloudera,  Inc.  All  rights  reserved.   When  to  Use  AC?  StaIc  ParIIoning?   With  Admission  Control   Without  Admission  Control   With  Sta2c   Par22oning   •  Using  Impala  with  other  systems  (e.g.   Hive,  Spark)  and  need  to  guarantee   each  get  resources   •  Heavy  Impala  workload,  need  to  make   sure  queries  aren’t  stepping  on  each   other   •  Using  Impala  with  other  systems  and   need  to  guarantee  each  get  resources   •  Light  to  moderate  Impala  workload,  not   using  all  available  resources  yet   Without  Sta2c   Par22oning   •  Impala  only  cluster,  or  other  systems   have  very  light,  non-­‐compeIng   workloads   •  Heavy  Impala  workload,  need  to  make   sure  queries  aren’t  stepping  on  each   other   •  Enough  cluster  resources  are  available   for  all  workloads  to  consume  as  much  as   necessary  
  • 6. 6  ©  Cloudera,  Inc.  All  rights  reserved.   (Aside:  A  Plethora  of  Mem  Limits)   •  Process  (impalad)  memory  limit   •  Max  memory  the  process  can  use  across  all  queries.  When  a  query  consumes  memory  such  that  the  process   hits  this  limit  the  query  is  killed   •  Set  with  the  “-­‐-­‐mem_limit”  impalad  command-­‐line  argument,  or  “Impala  Daemon  Memory  Limit”  in  CM.   The  value  is  specified  in  terms  of  single-­‐impalad  memory.   •  Pool  (admission  control)  memory  limit   •  Max  memory  the  queries  in  a  pool/queue  can  use.  The  value  is  used  only  to  admit  queries,  not  enforced  once   queries  are  admiAed.  The  value  is  specified  as  the  cluster-­‐wide  limit,  i.e.  aggregate  limit  across  all  impalads.   •  hAp://www.cloudera.com/content/cloudera/en/documentaIon/cloudera-­‐impala/latest/topics/ impala_admission.html   •  Query  (query  opIon)  memory  limit   •  Max  memory  a  query  can  use;  if  a  query  uses  more  than  it  may  have  to  be  killed  (if  it  can’t  spill).   •  Set  via  the  “set  mem_limit=Xg”  query  opIon.  Can  set  a  default  query  opIon  via  impalad  command-­‐line   arguments  (see  the  next  slide).   •  The  value  is  specified  in  terms  of  single-­‐impalad  memory,  e.g.  Xg  per  node   •  hAp://www.cloudera.com/content/cloudera/en/documentaIon/cloudera-­‐impala/latest/topics/ impala_mem_limit.html  
  • 7. 7  ©  Cloudera,  Inc.  All  rights  reserved.   Important!  AC  with  Mem  Limits  is  Tricky   •  Admission  based  on  pool  memory  limits  will  use:   • the  query  memory  limit  if  it  is  set    (set  MEM_LIMIT=Xg;)   • Otherwise  falls  back  to  an  esImate  from  planning,  this  is  usually  wrong!   •  Do  not  use  memory  limits  unless  you  set  query  memory  limits   • Consider  serng  a  default  value  for  the  ‘mem_limit’  query  opIon   • Set  via  the  ‘-­‐-­‐default_query_opIons’  impalad  argument   • E.g.  -­‐-­‐default_query_options='mem_limit=5g'   • Can  sIll  override  the  default  with  the  ‘set  mem_limit=X;’  query  opIon.   •  Picking  a  good  memory  limit  is  hard,  use  CM’s  charts  to  help  understand  your   workload  
  • 8. 8  ©  Cloudera,  Inc.  All  rights  reserved.   “Resource  Management”  Today,  Summary   •  Today:  Use  Admission  Control  and  StaIc  ParIIoning   •  We  skipped  over  a  lot  of  details,  see  the  docs  for  more  informaIon   • Impala  Admission  Control:   hAp://www.cloudera.com/content/cloudera/en/documentaIon/cloudera-­‐ impala/latest/topics/impala_admission.html   • “StaIc  ParIIoning”  in  Cloudera  Manager:   (also  called  “StaIc  Service  Pools”)   hAp://www.cloudera.com/content/cloudera/en/documentaIon/core/latest/ topics/cm_mc_service_pools.html   •  Ask  us  quesIons  on  impala-­‐user@cloudera.org  
  • 9. 9  ©  Cloudera,  Inc.  All  rights  reserved.   Impala  on  YARN   •  YARN  is  a  “resource  negoIator”  that  helps  share  cluster  resources  within  Hadoop   •  Works  well  for  MapReduce  and  similar  batch-­‐oriented  processing  engines     •  Doesn’t  work  well  for  services/frameworks  that  need:   •  Long  running  processes   •  Gang  scheduling   •  Very  low-­‐latency  scheduling  requirements   •  Doesn’t  work  so  well  for  Impala   • (And  also  HBase,  MPI,  Presto,  custom  apps,  etc.)  
  • 10. 10  ©  Cloudera,  Inc.  All  rights  reserved.   Llama  to  the  Rescue   •  Llama  =  Long  Lived  ApplicaIon  MAster   •  On  github:  hAp://cloudera.github.io/llama/index.html   •  An  interface  between:   • YARN’s  ApplicaIonMaster  (AM)  model   (batch  jobs  where  tasks  are  each  a  process,  coordinated  by  an  AM)   • Impala’s  low-­‐latency,  in-­‐process  query  model   •  Llama  provides:   • Gang-­‐scheduling   • “Container”  caching  (to  reduce  resource  acquisiIon  cost)  
  • 11. 11  ©  Cloudera,  Inc.  All  rights  reserved.   How  Llama  fits  in   1
  • 12. 12  ©  Cloudera,  Inc.  All  rights  reserved.   How  Llama  fits  in   1
  • 13. 13  ©  Cloudera,  Inc.  All  rights  reserved.   How  Llama  fits  in   1
  • 14. 14  ©  Cloudera,  Inc.  All  rights  reserved.   How  Llama  fits  in   1
  • 15. 15  ©  Cloudera,  Inc.  All  rights  reserved.   How  Llama  fits  in   1
  • 16. 16  ©  Cloudera,  Inc.  All  rights  reserved.   Gang  scheduling   •  YARN  returns  resources  in  a  trickle,  as  they  become  available   •  For  MR  this  is  perfect,  as  tasks  are  mostly  independent  (and   checkpoint  to  disk)   •  For  low-­‐latency  queries,  we  require  all  resources  to  be  available  at   once  so  that  query  tasks  can  stream  results  to  one  another   •  Llama  buffers  resources  between  YARN  and  Impala  to  make   resource  requests  appear  atomic  and  indivisible   1
  • 17. 17  ©  Cloudera,  Inc.  All  rights  reserved.   Resource  caching   • Every  container  requires  YARN  to  make  an  expensive  resource   allocaIon  decision   • We  ask  Llama  to  cache  resources  between  requests   • Containers  stay  in  their  queue  in  Llama,  unIl  YARN  forcefully   reclaims  them   1
  • 18. 18  ©  Cloudera,  Inc.  All  rights  reserved.   Impala  on  YARN:  Current  Status   •  Experimental  integraIon  was  shipped  in  Impala  1.4  /  CDH  5.0   •  Not  ready  for  use  yet!   •  A  number  of  known  bugs,  see  umbrella  JIRA  IMPALA-­‐2370  to  track   •  Some  (but  not  all)  important  fixes  in  upcoming  Impala  2.3  /  CDH  5.5  release   •  Ongoing  scale  and  performance  tesIng  work  needed  to  provide  guidance   •  In  a  future  release  (post-­‐Impala  2.3),  we  will  be  able  to  recommend  usage  for   some  workloads,  w/  guidance  
  • 19. 19  ©  Cloudera,  Inc.  All  rights  reserved.   Thank  you   @maAjacobs