SlideShare a Scribd company logo
1 of 22
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Hive LLAP : A High Performance, Cost-Effective Alternative to
Traditional MPP Databases
Any reference in this presentation to any specific commercial product, process, or service, or the use of any trade, firm, or corporation
name is for information and convenience only and is not an endorsement, favor, or recommendation byWalmart Inc.
Naveen Peddamail
Sr. Manager, Global Data
Abhishek Gupta
Data Engineer, Global Data
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
• Introduction toWalmart
• Data Lake Initiative – Building a Single Source ofTruth
• Challenges Around Low Latency Querying on Hadoop – Hive LLAP as a Solution
• Performance & Cost Effectiveness of Hive LLAP vs. MPP Databases
• Conclusion & Next Steps
• Q & A
Agenda
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
• Largest retailer in the world and Fortune 1 company
• Serves over 275M customers weekly
• Employs over 2.2M associates worldwide
• 11,300 stores under 58 banners in 27 countries
• eCommerce websites in 10 countries & brands include:
• Walmart.com
• Jet.com
• Hayneedle.com (home furnishings)
• Shoes.com (footwear)
• Moosejaw (outdoor apparel and gear)
• ModCloth (women’s apparel)
• Bonobos (men’s apparel)
To find out more, visit us at https://corporate.walmart.com
About Walmart Labs
• Employs over 4,000 associates worldwide
• Development centers in the US, India, and Ireland
• Open source projects include:
• Hapi (server framework for Node.js)
• OneOps (cloud management platform)
• Electrode (universal React/Node.js platform)
• TestArmada (suite of testing tools)
• Includes Global Data and Analytics Platform team
To find out more, visit us at:
https://www.walmartlabs.com
https://www.facebook.com/WalmartLabs
https://twitter.com/WalmartLabs
https://github.com/walmartlabs
About Walmart
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Data Landscape atWalmart
• Transactional systems from various domains generate huge volume of data every second
• Sales & Orders
• Merchandizing
• Logistics & Supply Chain
• Real Estate
• HR Systems
• Compliance
• Analytical & Reporting databases spread across various platforms and teams
• Challenges in correctly identifying Source ofTruth
• Data Quality, Governance, Metadata management & Lineage was difficult to manage
• Need to build a single source of truth – Data Lake
Data Lake Initiative – Building a Single Source ofTruth
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Criteria for the Data Lake
Governed, Secured & Certified Data
Single Source ofTruth
LowerTotal Cost of Ownership
Robust and Fast Data Access & Reporting
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Data Lake @ Walmart
01 02 03 04
Central true source of analytical data across
Walmart
.
Central Analytical Data Source
• Common services for metadata
• ETL pipeline
• Data quality framework
Data Service Layer
• Roles to manage access control
• Encryption for sensitive data elements
• Providing end to end lineage
Governed and Secure
• Enable ad-hoc analysis
• Improve speed to market for analysis
• Providing a self served storage and compute
platform
Self Service Platform
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Governed, Secured & Certified Data
Single Source ofTruth
LowerTotal Cost of Ownership
Robust and Fast Data Access & Reporting
Are the Business Users Happy Now?
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Low Latency Querying on Hadoop; HIVE LLAP as a Solution
Challenges
• Ad hoc query performance was not so great on
Hadoop/Hive
• Users benchmarked against Massively Parallel
Processing - Enterprise data warehouses (MPP
EDWs)
• Migrating some teams off of Enterprise Data
Warehouses was not possible until you could
guarantee better query response times.
• Queries migrated from other data-warehouses were
not optimal for querying on Hive
Potential
Solutions
Tune Queries for
optimal Hive
performance
Recommend Tez as
default execution
engine
Hive LLAP as a
Performance
Booster
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
JIT Optimization & in- Memory
Cashing
Data Sharing,
Asynchronous IO
Leverages long
lived Daemons
Bridges inefficiencies
of execution engines
Hive LLAPLOW LATENCYANALYTICAL PROCESSING
(Also known as Long Live and Process)
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Hive LLAP Architecture
Source: https://hortonworks.com/blog/top-5-performance-boosters-with-apache-hive-llap/
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Hive LLAP – ReviewingTPC-DS Benchmarks on HDP 2.6
Source: https://hortonworks.com/blog/3x-faster-interactive-query-hive-llap/
• 10TB Scale & the Data model for the underlying tables
were similar to our use case
• Hive LLAP Benchmarks looked promising forTPC-DS data
• Wider Tables
• Complex Dimension tables
SimilaritiesTo Walmart’s Data Model Differences From Walmart’s Data Model
© 2019Walmart Inc.All Rights Reserved
POC GOALS
 Benchmark Hive LLAP query performance
on 3NF Tables involving Joins
 Compare Hive LLAP query performance vs.
MPP-EDWs on same set of queries
Hive LLAP – POC
DATA MODEL
© 2019Walmart Inc.All Rights Reserved
• Hadoop Distribution – HDP 2.6.3
• YARN Scheduler – Capacity Scheduler with pre-emption enabled
• Number of LLAP Nodes –Two Configs 10 Nodes & 15 Nodes.
• Hardware – 256GB RAM, 32 Cores, and 14*6TB disks. Incremental Spend : ~ $ 150K
• Overall Hadoop Cluster Nodes – 90 Nodes
Hive LLAP – Environment Setup
© 2019Walmart Inc.All Rights Reserved
Hive LLAP – Environment Setup
YARN Config
Nodemanager Max Container Size (MB) 230400
Number of LLAP nodes 10 & 15 (TwoVariations)
LLAP Configs
hive.llap.execution.mode all
hive.llap.io.memory.mode cache
hive.llap.io.enabled TRUE
Slider Memory 2048
tez.am.resource.memory.mb 2048
LLAP Daemon Container Max Headroom 8192
Number of concurrent queries 10
Memory per Daemon 226304
Number of executors per LLAP Daemon 44
hive.llap.io.threadpool.size 44
LLAP Daemon Heap Size (MB) 171213
In-Memory Cache per Daemon (MB) 46899
© 2019Walmart Inc.All Rights Reserved
Hive LLAP – Query Patterns & Stats
Query Characteristics
• Queries fall mainly into reporting & ad-hoc workloads
with a focus on business applications
• Aggregations of key metrics across various location,
item & timeframe dimensions
• Scans involving large tables & Joins on multiple tables
• Sorting across various dimensions & facts
• 48 Queries over 4 Time Frames
Table Stats
• Fact Table (1 year data): ~70 Billion rows, 12 TB
• Dimensions(1 key table): ~25 Million rows, 110 GB
SELECT l.column1, l.column2, i.column3, i.column4,
d.column5, sum(s.column6), sum(s. column7),
avg(s.column8), avg(s.column9)
….
….
….
FROM sales as s
JOIN item_dim as i on s.item_id=i.item_id
JOIN location_dim as l on s.location_id=l.location_id
JOIN date_dim as d on s.visit_dt=d.cal_dt
WHERE s.column10 BETWEEN <val1> and <val2>
AND l.column11 = <val3>
…
…
GROUP BY
l.column1, l.column2, i.column3, i.column4, d.column5
ORDER BY
l.column1, l.column2, i.column3, i.column4, d.column5;
Sample Query
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Hive LLAP – Results
0
50
100
150
200
250
300
350
400
450
ExecutionTime(seconds)
Hive LLAP Performance Benchmark
1 Week 4 Weeks 12 Weeks 52 Weeks
75% of the queries ran in < 100 secs
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
30% - 50% Performance Improvement between 10 node vs. 15 node configuration
0
100
200
300
400
500
600
ExecutionTime(seconds)
Queries
Hive LLAP Query Performance for 10 vs. 15 Nodes - Linear Scalability
LLAP -15 Nodes LLAP-10 Nodes
1 Week 4 Weeks 12 Weeks 52 Weeks
Hive LLAP – Results
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Comparing Query Performance of Hive LLAP vs. MPP-EDWs
• For our Comparative analysis, we used two MPP-EDW Clusters
• Queries in the MPP-EDW Clusters were optimized for best performance
Hadoop Cluster
~ 4 TB Memory
480 VCores
MPP EDW B
~ 16 TB Memory
840 VCores
MPP EDW A
~ 4 TB Memory
512 VCores
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
• LLAP performed better than MPP EDW-A system having similar infrastructure
• Comparable difference between LLAP and MPP EDW-B; Provided 4x Infrastructure for MPP
Comparing Query Performance of Hive LLAP vs. MPP-EDWs
0
100
200
300
400
500
600
700
800
ExecutionTime(seconds)
Hive LLAP vs. MPP-A vs. MPP-B
LLAP (Secs) MPP - Enterprise Data Warehouse A (Secs) MPP - Enterprise Data Warehouse B (Secs)
4 Weeks1 Week 13 Weeks 52 Weeks
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Hive LLAP: Conclusion & Next Steps
• Promising product for low latency SQLAccess on top of Hadoop
• Significant Cost Savings vs.Traditional MPP databases
• Not a one size fits all solution
Next Steps:
• Evaluate Hive LLAP on HDP 3.x (Better Enterprise Support)
• Resource Plans & Workload Manager
• SSD Caching
• HS2I : Hive Server2 Interactive - High Availability
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Thank You !
Abhishek Gupta
Data Engineer, Walmart
Abhishek.gupta2@Walmart.com
https://www.linkedin.com/in/gupta-abhishek/
Naveen Peddamail
Sr. Manager, Walmart
Naveen.Peddamail@walmart.com
https://www.linkedin.com/in/naveenpeddamail/
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Questions?

More Related Content

What's hot

Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in ImpalaCloudera, Inc.
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureSkillspeed
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization confluent
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Apache spark 소개 및 실습
Apache spark 소개 및 실습Apache spark 소개 및 실습
Apache spark 소개 및 실습동현 강
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudMichael Stack
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 

What's hot (20)

Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in Impala
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Apache spark 소개 및 실습
Apache spark 소개 및 실습Apache spark 소개 및 실습
Apache spark 소개 및 실습
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 

Similar to Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP Databases

Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLMatt Lord
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal GemfireIn-Memory Computing Summit
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)Ontico
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetupByung Ho Lee
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!KNIMESlides
 
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
Big Data as easy as 1, 2, 3, ... 4 ... with KNIMEBig Data as easy as 1, 2, 3, ... 4 ... with KNIME
Big Data as easy as 1, 2, 3, ... 4 ... with KNIMERosaria Silipo
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Will Du
 
Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeApache Geode
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)Anthony Baker
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...EMC
 
How to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsHow to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsDataCore Software
 
Building a Hadoop Powered Commerce Data Pipeline
Building a Hadoop Powered Commerce Data PipelineBuilding a Hadoop Powered Commerce Data Pipeline
Building a Hadoop Powered Commerce Data PipelineDataWorks Summit
 
MySQL High Availibility Solutions
MySQL High Availibility SolutionsMySQL High Availibility Solutions
MySQL High Availibility SolutionsMark Swarbrick
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 
MySQL High Availability Solutions - Feb 2015 webinar
MySQL High Availability Solutions - Feb 2015 webinarMySQL High Availability Solutions - Feb 2015 webinar
MySQL High Availability Solutions - Feb 2015 webinarAndrew Morgan
 
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...Amazon Web Services
 

Similar to Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP Databases (20)

AWS Database Services @ Scale
AWS Database Services @ ScaleAWS Database Services @ Scale
AWS Database Services @ Scale
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQL
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetup
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!
 
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
Big Data as easy as 1, 2, 3, ... 4 ... with KNIMEBig Data as easy as 1, 2, 3, ... 4 ... with KNIME
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica
 
Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache Geode
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
How to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsHow to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANs
 
SAP HANA on Power
SAP HANA on PowerSAP HANA on Power
SAP HANA on Power
 
Building a Hadoop Powered Commerce Data Pipeline
Building a Hadoop Powered Commerce Data PipelineBuilding a Hadoop Powered Commerce Data Pipeline
Building a Hadoop Powered Commerce Data Pipeline
 
MySQL High Availibility Solutions
MySQL High Availibility SolutionsMySQL High Availibility Solutions
MySQL High Availibility Solutions
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
MySQL High Availability Solutions - Feb 2015 webinar
MySQL High Availability Solutions - Feb 2015 webinarMySQL High Availability Solutions - Feb 2015 webinar
MySQL High Availability Solutions - Feb 2015 webinar
 
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP Databases

  • 1. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Hive LLAP : A High Performance, Cost-Effective Alternative to Traditional MPP Databases Any reference in this presentation to any specific commercial product, process, or service, or the use of any trade, firm, or corporation name is for information and convenience only and is not an endorsement, favor, or recommendation byWalmart Inc. Naveen Peddamail Sr. Manager, Global Data Abhishek Gupta Data Engineer, Global Data
  • 2. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved • Introduction toWalmart • Data Lake Initiative – Building a Single Source ofTruth • Challenges Around Low Latency Querying on Hadoop – Hive LLAP as a Solution • Performance & Cost Effectiveness of Hive LLAP vs. MPP Databases • Conclusion & Next Steps • Q & A Agenda
  • 3. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved • Largest retailer in the world and Fortune 1 company • Serves over 275M customers weekly • Employs over 2.2M associates worldwide • 11,300 stores under 58 banners in 27 countries • eCommerce websites in 10 countries & brands include: • Walmart.com • Jet.com • Hayneedle.com (home furnishings) • Shoes.com (footwear) • Moosejaw (outdoor apparel and gear) • ModCloth (women’s apparel) • Bonobos (men’s apparel) To find out more, visit us at https://corporate.walmart.com About Walmart Labs • Employs over 4,000 associates worldwide • Development centers in the US, India, and Ireland • Open source projects include: • Hapi (server framework for Node.js) • OneOps (cloud management platform) • Electrode (universal React/Node.js platform) • TestArmada (suite of testing tools) • Includes Global Data and Analytics Platform team To find out more, visit us at: https://www.walmartlabs.com https://www.facebook.com/WalmartLabs https://twitter.com/WalmartLabs https://github.com/walmartlabs About Walmart
  • 4. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Data Landscape atWalmart • Transactional systems from various domains generate huge volume of data every second • Sales & Orders • Merchandizing • Logistics & Supply Chain • Real Estate • HR Systems • Compliance • Analytical & Reporting databases spread across various platforms and teams • Challenges in correctly identifying Source ofTruth • Data Quality, Governance, Metadata management & Lineage was difficult to manage • Need to build a single source of truth – Data Lake Data Lake Initiative – Building a Single Source ofTruth
  • 5. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Criteria for the Data Lake Governed, Secured & Certified Data Single Source ofTruth LowerTotal Cost of Ownership Robust and Fast Data Access & Reporting
  • 6. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Data Lake @ Walmart 01 02 03 04 Central true source of analytical data across Walmart . Central Analytical Data Source • Common services for metadata • ETL pipeline • Data quality framework Data Service Layer • Roles to manage access control • Encryption for sensitive data elements • Providing end to end lineage Governed and Secure • Enable ad-hoc analysis • Improve speed to market for analysis • Providing a self served storage and compute platform Self Service Platform
  • 7. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Governed, Secured & Certified Data Single Source ofTruth LowerTotal Cost of Ownership Robust and Fast Data Access & Reporting Are the Business Users Happy Now?
  • 8. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Low Latency Querying on Hadoop; HIVE LLAP as a Solution Challenges • Ad hoc query performance was not so great on Hadoop/Hive • Users benchmarked against Massively Parallel Processing - Enterprise data warehouses (MPP EDWs) • Migrating some teams off of Enterprise Data Warehouses was not possible until you could guarantee better query response times. • Queries migrated from other data-warehouses were not optimal for querying on Hive Potential Solutions Tune Queries for optimal Hive performance Recommend Tez as default execution engine Hive LLAP as a Performance Booster
  • 9. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved JIT Optimization & in- Memory Cashing Data Sharing, Asynchronous IO Leverages long lived Daemons Bridges inefficiencies of execution engines Hive LLAPLOW LATENCYANALYTICAL PROCESSING (Also known as Long Live and Process)
  • 10. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Hive LLAP Architecture Source: https://hortonworks.com/blog/top-5-performance-boosters-with-apache-hive-llap/
  • 11. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Hive LLAP – ReviewingTPC-DS Benchmarks on HDP 2.6 Source: https://hortonworks.com/blog/3x-faster-interactive-query-hive-llap/ • 10TB Scale & the Data model for the underlying tables were similar to our use case • Hive LLAP Benchmarks looked promising forTPC-DS data • Wider Tables • Complex Dimension tables SimilaritiesTo Walmart’s Data Model Differences From Walmart’s Data Model
  • 12. © 2019Walmart Inc.All Rights Reserved POC GOALS  Benchmark Hive LLAP query performance on 3NF Tables involving Joins  Compare Hive LLAP query performance vs. MPP-EDWs on same set of queries Hive LLAP – POC DATA MODEL
  • 13. © 2019Walmart Inc.All Rights Reserved • Hadoop Distribution – HDP 2.6.3 • YARN Scheduler – Capacity Scheduler with pre-emption enabled • Number of LLAP Nodes –Two Configs 10 Nodes & 15 Nodes. • Hardware – 256GB RAM, 32 Cores, and 14*6TB disks. Incremental Spend : ~ $ 150K • Overall Hadoop Cluster Nodes – 90 Nodes Hive LLAP – Environment Setup
  • 14. © 2019Walmart Inc.All Rights Reserved Hive LLAP – Environment Setup YARN Config Nodemanager Max Container Size (MB) 230400 Number of LLAP nodes 10 & 15 (TwoVariations) LLAP Configs hive.llap.execution.mode all hive.llap.io.memory.mode cache hive.llap.io.enabled TRUE Slider Memory 2048 tez.am.resource.memory.mb 2048 LLAP Daemon Container Max Headroom 8192 Number of concurrent queries 10 Memory per Daemon 226304 Number of executors per LLAP Daemon 44 hive.llap.io.threadpool.size 44 LLAP Daemon Heap Size (MB) 171213 In-Memory Cache per Daemon (MB) 46899
  • 15. © 2019Walmart Inc.All Rights Reserved Hive LLAP – Query Patterns & Stats Query Characteristics • Queries fall mainly into reporting & ad-hoc workloads with a focus on business applications • Aggregations of key metrics across various location, item & timeframe dimensions • Scans involving large tables & Joins on multiple tables • Sorting across various dimensions & facts • 48 Queries over 4 Time Frames Table Stats • Fact Table (1 year data): ~70 Billion rows, 12 TB • Dimensions(1 key table): ~25 Million rows, 110 GB SELECT l.column1, l.column2, i.column3, i.column4, d.column5, sum(s.column6), sum(s. column7), avg(s.column8), avg(s.column9) …. …. …. FROM sales as s JOIN item_dim as i on s.item_id=i.item_id JOIN location_dim as l on s.location_id=l.location_id JOIN date_dim as d on s.visit_dt=d.cal_dt WHERE s.column10 BETWEEN <val1> and <val2> AND l.column11 = <val3> … … GROUP BY l.column1, l.column2, i.column3, i.column4, d.column5 ORDER BY l.column1, l.column2, i.column3, i.column4, d.column5; Sample Query
  • 16. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Hive LLAP – Results 0 50 100 150 200 250 300 350 400 450 ExecutionTime(seconds) Hive LLAP Performance Benchmark 1 Week 4 Weeks 12 Weeks 52 Weeks 75% of the queries ran in < 100 secs
  • 17. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved 30% - 50% Performance Improvement between 10 node vs. 15 node configuration 0 100 200 300 400 500 600 ExecutionTime(seconds) Queries Hive LLAP Query Performance for 10 vs. 15 Nodes - Linear Scalability LLAP -15 Nodes LLAP-10 Nodes 1 Week 4 Weeks 12 Weeks 52 Weeks Hive LLAP – Results
  • 18. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Comparing Query Performance of Hive LLAP vs. MPP-EDWs • For our Comparative analysis, we used two MPP-EDW Clusters • Queries in the MPP-EDW Clusters were optimized for best performance Hadoop Cluster ~ 4 TB Memory 480 VCores MPP EDW B ~ 16 TB Memory 840 VCores MPP EDW A ~ 4 TB Memory 512 VCores
  • 19. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved • LLAP performed better than MPP EDW-A system having similar infrastructure • Comparable difference between LLAP and MPP EDW-B; Provided 4x Infrastructure for MPP Comparing Query Performance of Hive LLAP vs. MPP-EDWs 0 100 200 300 400 500 600 700 800 ExecutionTime(seconds) Hive LLAP vs. MPP-A vs. MPP-B LLAP (Secs) MPP - Enterprise Data Warehouse A (Secs) MPP - Enterprise Data Warehouse B (Secs) 4 Weeks1 Week 13 Weeks 52 Weeks
  • 20. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Hive LLAP: Conclusion & Next Steps • Promising product for low latency SQLAccess on top of Hadoop • Significant Cost Savings vs.Traditional MPP databases • Not a one size fits all solution Next Steps: • Evaluate Hive LLAP on HDP 3.x (Better Enterprise Support) • Resource Plans & Workload Manager • SSD Caching • HS2I : Hive Server2 Interactive - High Availability
  • 21. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Thank You ! Abhishek Gupta Data Engineer, Walmart Abhishek.gupta2@Walmart.com https://www.linkedin.com/in/gupta-abhishek/ Naveen Peddamail Sr. Manager, Walmart Naveen.Peddamail@walmart.com https://www.linkedin.com/in/naveenpeddamail/
  • 22. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Questions?

Editor's Notes

  1. We are a small - fortune 1 retailer from a small place back in NW Arkansas.  With the foot print in 27 countries and multiple online brands. WalmartLabs is the technology backbone for Walmart and we are located globally. Global D&A platform is part of WlamartLabs
  2. https://hortonworks.com/blog/top-5-performance-boosters-with-apache-hive-llap/