SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Using BigBench to compare
Hive and Spark
Nicolas Poggi, Alejandro Montero
April 2017
Outline
1. Intro to BSC and ALOJA
2. BigBench
3. Sequential tests
1. Data scales
4. Concurrency tests
5. Summary
2
Barcelona Supercomputing Center (BSC)
• Spanish national supercomputing center 22 years history in:
• Computer Architecture, networking and distributed systems
research
• Based at BarcelonaTech University (UPC)
• Large ongoing life science computational projects
• Prominent body of research activity around Hadoop
• 2008-2013: SLA Adaptive Scheduler, Accelerators, Locality
Awareness, Performance Management. 7 publications
• 2013-Present: Cost-efficient upcoming Big Data architectures
(ALOJA) 8+ publications
ALOJA: towards cost-effective Big Data
• Research project for automating characterization and
optimization of Big Data deployments
• Open source Benchmarking-to-Insights platform and tools
• Largest Big Data public repository (70,000+ jobs)
• Community collaboration with industry and academia
http://aloja.bsc.es
Big Data
Benchmarking
Online
Repository
Web / ML
Analytics
Benchmarking and BigBench
The need for a new benchmark standard
• A benchmark captures the solution to a problem and guide decision making
• Database related benchmarks standards
• Transactional (OLTP): TPC C and E
• Decision Support (DSS/OLAP): TPC H and DS
• And for Big Data analytics properties?
• 3 Vs, ML, M/R
• Benchmark uses:
• System tuning and debugging
• Spread and broad Big Data ecosystem
• Set common rules
• Vendor comparison
• Transparency across the industry
6
What is BigBench (TPCx-BB1)?
• End-to-end application level benchmark
• result of many years of collaboration
• industry and academia
• Covers most Big Data Analytical properties (3Vs)
• Based on a retailer company (extension of TPC-DS)
7
[1]: http://www.tpc.org/tpc_documents_current_versions/pdf/tpcx-bb_v1.2.0.pdf
2012
•Launched at WBDB
2013
•Published at SIGMOD
2014
•First implementation
2016
•Standardized by TPC (Feb)
2016
•TCPx-BB Version 1.2 (Nov)
BigBench history
BigBench use cases and process overview
• 30 business uses cases covering:
• Merchandising,
• Pricing Optimization
• Product Return
• Customers...
• Implementation resulted in:
• 14 Declarative queries (SQL)
• 7 Queries with Natural Language Processing
• 4 Queries with data preprocessing with
MapReduce jobs.
• 5 Queries with Machine Learning post
processing.
8
Data generation
Data loading
Power test
Throughput test 1
Data refresh
Throughput test 2
Result
• BB queries / hour
BigBench v1.2 – Reference Implementation
HDFS
Hive Metastore
MapReduce Tez Spark
Yarn
Hive Spark SQL
Mahout ML Custom Spark MLlibMachine Learning
SQL Engine
Table Metastore
Execution Engine
Filesystem
Benchmarked systems:
• Hive + MapReduce + Mahout
• Hive + MapReduce + Spark_MLlib
• Hive + Tez + Mahout
• Hive + Tez + Spark_MLlib
• Spark SQL + Mahout
• Spark SQL + Spark_MLlib
• Spark 2 SQL + Mahout
Work in progress:
• Hive 2
• Spark 2 SQL + Spark_MLlib
The cluster (I) – HDInsight PaaS
10
Model D4v2
# Head nodes 2
# Working nodes 4
# Zookeeper nodes 3
CPU
Intel(R) Xeon(R) CPU E5-
2673 v3
8 x 2,4 GHz cores
RAM 28 GB
HDFS Remote
Software HortonWorks Data
Platform 2.5
Non-conditional map join conversion with small tables lesser than 319 MB
Software configuration
Mapper/Reduccer/Tez
memory
1536 MB
Mapper/Reduccer/Tez
Heap Space
1024 MB
Mapper/Reduccer/Tez
Cores
1
Hive MapJoins Yes
Spark executors 4
Spark executor memory 4608 MB
Spark executor Cores 3
Sequential runs (power)
Queries 1-30
Average of three executios of 100 GB Scale Factor 11
BigBench workload – power test
12
Load to Hive
Metastore
Data
Generation Query 1
HDFS
Hive Query 2 …. Query 30
Pure QL
13
6223
1601
1848
1457
0
1000
2000
3000
4000
5000
6000
7000
Hive_MR Hive_tez Spark_1.6.2 Spark_2.0.1
Timeinseconds
Average of three executions using 100 GB Scale Factor
Query 12 CPU behavior
14
Tez Spark 1.6.2 Spark 2.0.2
Average of three executions using 100 GB Scale Factor
Custom Reducers
15
2815
1122
1629
1466
0
500
1000
1500
2000
2500
3000
Hive_MR Hive_tez Spark_1.6.2 Spark_2.0.1
Timeinseconds
Average of three executions using 100 GB Scale Factor
Query 2 CPU behavior
16
Tez Spark 1.6.2 Spark 2.0.2
Average of three executions using 100 GB Scale Factor
Natural Language Processing
17
5004
1100
2289
1913
0
1000
2000
3000
4000
5000
6000
Hive_MR Hive_Tez Spark_1.6.2 Spark_2.0.1
Timeinseconds
Average of three executions using 100 GB Scale Factor
Query 27 CPU behavior
18
Tez Spark 1.6.2 Spark 2.0.2
Average of three executions using 100 GB Scale Factor
Machine Learning
19
3550
1613
3769
1898
4045
1937
3390
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Hive_MR +
Mahout
Hive_MR +
Spark_ml
Hive_tez +
Mahout
Hive_Tez +
Spark_ML
Spark_1.6.2 +
Mahout
Spark_1.6.2 +
Spark_ML
Spark_2.0.1 +
Mahout
Timeinseconds
Average of three executions using 100 GB Scale Factor
Query 5 CPU behavior
20
Tez + Mahout Tez + Spark_MLlib
Average of three executions using 100 GB Scale Factor
21
Aggregated Results
6223 6223
1601 1623 1848 1951 1457
2815 2815
1122 1137
1629 1489
1466
5004 5004
1100 1082
2289 2356
1913
3550
1613
3769
1898
4045
1937 3390
17592
15655
7592
5740
9811
7733
8226
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Hive_MR +
Mahout
Hive_MR +
Spark_ML
Hive_tez +
Mahout
Hive_Tez +
Spark_ML
Spark_1.6.2 +
Mahout
Spark_1.6.2 +
Spark_ML
Spark_2.0.1 +
Mahout
Timeinseconds
Pure-QL Custom Reducers NLP ML Total
Average of three executions using 100 GB Scale Factor
Scaling from 1GB to 1TB
Log scales 22
Concurrency runs (throughput)
2, 4, 8 parallel streams
23
BigBench workload – Throughput test
24
Query 15 Query 21 …. Query 16
Query 12 Query 18 …. Query 22
Query 16 Query 30 …. Query 29
Load Data
Data
Generation
HDFS
Hive
The cluster (II) – HDInsight PaaS
25
Model HDInsight D4v3
# Head nodes 2
# Working nodes 7
# Zookeeper nodes 3
CPU
Intel(R) Xeon(R) CPU E5-
2673 v3
8 x 2,4 GHz cores
RAM 28 GB
55 GB (Headnode)
HDFS Remote
Software HortonWorks Data
Platform 2.5
Software configuration
Mapper/Reduccer/Tez
memory
1536 MB
Mapper/Reduccer/Tez
Heap Space
1024 MB
Mapper/Reduccer/Tez
Cores
1
Hive MapJoins Yes
Spark executors 9
Spark executor memory 4608 MB
Spark executor Cores 3
Non-conditional map join conversion with small tables lesser than 319 MB
Spark vs Hive + Tez in throughput tests
26
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Throughput-2 Streams Throughput-4 Streams Throughput - 8 Streams
Executiontime(s)
Hive+Tez Spark
27
0
500
1000
1500
2000
2500
0
20
40
60
80
100
120
2
202
402
602
802
1002
1202
1402
1602
1802
2002
2202
2402
2602
2802
3002
3202
3402
3602
3802
4002
4202
4402
4602
4802
5002
5202
5402
5602
5802
6002
6202
Finishedcontainers
Totalnumberofcontainers
Hive + Tez
0
200
400
600
800
1000
1200
1400
0
20
40
60
80
100
120
Finishedcontainers
Totalnumberofcontainers
Spark
28
0
500
1000
1500
2000
2500
3000
0
20
40
60
80
100
120
140
1
201
401
601
801
1001
1201
1401
1601
1801
2001
2201
2401
2601
2801
3001
3201
3401
3601
3801
4001
4201
4401
4601
4801
5001
5201
5401
5601
5801
6001
6201
6401
6601
6801
7001
7201
Finishedcontainers
Totalnumberofcontainers
Hive + Tez
0
500
1000
1500
2000
2500
3000
0
20
40
60
80
100
120
140
1
201
401
601
801
1001
1201
1401
1601
1801
2001
2201
2401
2601
2801
3001
3201
3401
3601
3801
4001
4201
4401
Finishedcontainers
Totalnumberofcontainers
Spark
29
0
1000
2000
3000
4000
5000
6000
0
20
40
60
80
100
120
140
1
223
445
667
889
1111
1333
1555
1777
1999
2221
2443
2665
2887
3109
3331
3553
3775
3997
4219
4441
4663
4885
5107
5329
5551
5773
5995
Finishedcontainers
Totalnumberofcontainers
Chart Title
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0
20
40
60
80
100
120
140
1
311
621
931
1241
1551
1861
2171
2481
2791
3101
3411
3721
4031
4341
4651
4961
5271
5581
5891
6201
6511
6821
7131
7441
7751
8061
8371
Finishedcontainers
Totalnumberofcontainers
Hive + Tez
Throughput test 8 Streams CPU behavior
30
Tez + Spark_MLlib Spark 1.6.2 + Spark_MLlib
Summary
Average of three executios of 100 GB Scale Factor 31
Conclusions (I)
• Hive on Tez improves SQL performance over Hive on MapReduce.
• It is also faster than Hive on spark 1.
• Hive on spark 2 is slightly faster.
• The Spark implementation is based on hive…
• Could be faster on a native Spark implementation
• Spark MLlib has improved performance over Mahout.
• Best production combination on HDI: Apache Tez for SQL + Spark MLlib for
Machine Learning.
32
Conclusions (II)
• In concurrency scenarios Spark gave better results than Hive on Tez
• Spark shows a constant container execution and leverages in-mem caching
• Container execution follows an almost linear progression.
• Hive on Tez shows important spikes in container execution.
• Spark uses more containers than Hive on tez when scaling up.
• Queueing times don’t seem to be an issue in either engine.
• BigBench is new, but it can already help us guide our decision making
• It needs more BigData engines to be added
• We hope more vendors and providers start releasing audited results
33
Resources and references
BigBench and ALOJA
• Original BigBench Implementation repository
• https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-
Bench
• ALOJA benchmarking platform
• https://github.com/Aloja/aloja
• http://aloja.bsc.es/publications
• ALOJA fork of BigBench (adds support for HDI and fixes spark)
• https://github.com/Aloja/Big-Data-Benchmark-for-Big-Bench
• TPCx-BB reference:
• http://www.tpc.org/tpc_documents_current_versions/pdf/tpcx-
bb_v1.2.0.pdf
• Evaluating Hive and Spark SQL with BigBench – T. Ivanov et. Al.
• http://arxiv.org/ftp/arxiv/papers/1512/1512.08417.pdf
• BigBench SIGMOD 2013 paper
• http://dl.acm.org/citation.cfm?id=2463712&CFID=878332641&
CFTOKEN=16933633
• The State of SQL-on-Hadoop in the Cloud – N. Poggi et. al.
• https://doi.org/10.1109/BigData.2016.7840751
Big Data Benchmarking
• Big Data Benchmarking Community (BDBC) mailing
list
• (~200 members from ~80organizations)
• http://clds.sdsc.edu/bdbc/community
• Workshop Big Data Benchmarking (WBDB)
• http://clds.sdsc.edu/bdbc/workshops
• SPEC Research Big Data working group
• http://research.spec.org/working-groups/big-data-
working-group.html
• Benchmarking slides and video:
• Benchmarking Hadoop:
• https://www.slideshare.net/ni_po/benchmarking-hadoop
• Michael Frank on Big Data benchmarking
• http://www.tele-task.de/archive/podcast/20430/
• Tilmann Rabl Big Data Benchmarking Tutorial
• http://www.slideshare.net/tilmann_rabl/ieee2014-
tutorialbarurabl
34
Thanks, questions?
Follow up / feedback : Nicolas.Poggi@bsc.es
Using BigBench to compare Hive and Spark
35

Contenu connexe

Tendances

Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresSpark Summit
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)Nicolas Poggi
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Databricks
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryDatabricks
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangSpark Summit
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobileDataWorks Summit
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonMiklos Christine
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanDatabricks
 
Hive on Spark, production experience @Uber
 Hive on Spark, production experience @Uber Hive on Spark, production experience @Uber
Hive on Spark, production experience @UberFuture of Data Meetup
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark Summit
 

Tendances (20)

Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent Memory
 
How do you decide where your customer was?
How do you decide where your customer was?How do you decide where your customer was?
How do you decide where your customer was?
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
 
February 2014 HUG : Hive On Tez
February 2014 HUG : Hive On TezFebruary 2014 HUG : Hive On Tez
February 2014 HUG : Hive On Tez
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
 
Hive on Spark, production experience @Uber
 Hive on Spark, production experience @Uber Hive on Spark, production experience @Uber
Hive on Spark, production experience @Uber
 
Running Spark in Production
Running Spark in ProductionRunning Spark in Production
Running Spark in Production
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
 

Similaire à Using BigBench to compare Hive and Spark (Long version)

WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBencht_ivanov
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)Stratebi
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RDatabricks
 
The State of Spark in the Cloud with Nicolas Poggi
The State of Spark in the Cloud with Nicolas PoggiThe State of Spark in the Cloud with Nicolas Poggi
The State of Spark in the Cloud with Nicolas PoggiSpark Summit
 
Advanced Analytics in Hadoop
Advanced Analytics in HadoopAdvanced Analytics in Hadoop
Advanced Analytics in HadoopAnalyticsWeek
 
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...OVHcloud
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
RDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation PruningRDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation Pruningwajrcs
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Etu Solution
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about SparkGiivee The
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkDataWorks Summit/Hadoop Summit
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionElasticsearch
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterDatabricks
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkBurak Yavuz
 

Similaire à Using BigBench to compare Hive and Spark (Long version) (20)

WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
The State of Spark in the Cloud with Nicolas Poggi
The State of Spark in the Cloud with Nicolas PoggiThe State of Spark in the Cloud with Nicolas Poggi
The State of Spark in the Cloud with Nicolas Poggi
 
Advanced Analytics in Hadoop
Advanced Analytics in HadoopAdvanced Analytics in Hadoop
Advanced Analytics in Hadoop
 
Big Data
Big DataBig Data
Big Data
 
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
RDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation PruningRDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation Pruning
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in Telco
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
 

Plus de Nicolas Poggi

Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA ConstraintsBenchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA ConstraintsNicolas Poggi
 
Correctness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQLCorrectness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQLNicolas Poggi
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataNicolas Poggi
 
Vagrant + Docker provider [+Puppet]
Vagrant + Docker provider [+Puppet]Vagrant + Docker provider [+Puppet]
Vagrant + Docker provider [+Puppet]Nicolas Poggi
 
The case for Hadoop performance
The case for Hadoop performanceThe case for Hadoop performance
The case for Hadoop performanceNicolas Poggi
 

Plus de Nicolas Poggi (6)

Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA ConstraintsBenchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA Constraints
 
Correctness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQLCorrectness and Performance of Apache Spark SQL
Correctness and Performance of Apache Spark SQL
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 
Vagrant + Docker provider [+Puppet]
Vagrant + Docker provider [+Puppet]Vagrant + Docker provider [+Puppet]
Vagrant + Docker provider [+Puppet]
 
The case for Hadoop performance
The case for Hadoop performanceThe case for Hadoop performance
The case for Hadoop performance
 

Dernier

Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampVICTOR MAESTRE RAMIREZ
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...OnePlan Solutions
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageDista
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
Streamlining Your Application Builds with Cloud Native Buildpacks
Streamlining Your Application Builds  with Cloud Native BuildpacksStreamlining Your Application Builds  with Cloud Native Buildpacks
Streamlining Your Application Builds with Cloud Native BuildpacksVish Abrams
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Incrobinwilliams8624
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.Sharon Liu
 

Dernier (20)

Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
Streamlining Your Application Builds with Cloud Native Buildpacks
Streamlining Your Application Builds  with Cloud Native BuildpacksStreamlining Your Application Builds  with Cloud Native Buildpacks
Streamlining Your Application Builds with Cloud Native Buildpacks
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Inc
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
 

Using BigBench to compare Hive and Spark (Long version)