SlideShare une entreprise Scribd logo
1  sur  47
1 © Hortonworks Inc. 2011–2018. All rights reserved.
What is new in Apache Hive?
Ashutosh Chauhan
2 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Hive – Distant Past – First Five Years
• Initial use case: batch processing
• Circa 2008
• Read-only data
• MapReduce
• HiveQL
3 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Hive – Past 5 Years
• Effort to take Hive beyond its batch processing roots
• Started in Apache Hive 0.10.0 (January 2013)
• Latest released version: Apache Hive 3.0 (May 2018)
• Extensive renovation along four different axes
• Runtime : Enable sub-second queries - LLAP
• Compiler : Cost Based Optimizer
• SQL support : Improved coverage of SQL syntax
• Transactional Support : ACID
4 © Hortonworks Inc. 2011–2018. All rights reserved.
Hive – Today
• Comprehensive ANSI SQL including all TPC-DS Queries.
• The only Hadoop SQL with ACID MERGE for easy updates.
• In-Memory caching for MPP performance at Hadoop scale.
• Enables Per-User dynamic row and column security.
• Enables Replication and DR for critical workloads.
• Compatible with every major BI Tool.
• Proven at 300+ PB Scale.
5 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Hive: Fast Facts
Most Queries Per Hour
100,000 Queries Per Hour
Analytics Performance
100 Million rows/s Per Node
Largest Hive Warehouse
300+ PB Raw Storage
Largest Cluster
4,500+ Nodes
6 © Hortonworks Inc. 2011–2018. All rights reserved.
Hive: Serving ETL Workloads to BI Systems
BI
systems
Materialized
view
Improved
Stats
Constraints
Query
Result
Cache
Workload
manage
ment
ACID v2
• Results return
from HDFS/cache
directly
• Reduce load from
repetitive queries
• Allows more
queries to be run
in parallel
• Reduce resource
starvation in large
clusters
• Also:
Active/Passive HA
• More “tools” for
optimizer to use
• More ”tools” for
DBAs to
tune/optimize
• Invisible tuning of
DB from users’
perspective
• ACID v2 is as fast
as regular tables
7 © Hortonworks Inc. 2011–2018. All rights reserved.
• SIGMOD Software Systems Award
• “For developing seminal software systems that served to bring relational-style
declarative programming to the Hadoop ecosystem.”
• Postgres, SQLLite and MonetDB
8 © Hortonworks Inc. 2011–2018. All rights reserved.
Hive – How Did We Get Here?
• LLAP Enhancements
• CBO Enhancements
• ACID Enhancements
9 © Hortonworks Inc. 2011–2018. All rights reserved.
Materialized Views in Hive
10 © Hortonworks Inc. 2011–2018. All rights reserved.
Accelerating Query Processing
• Change data physical properties (distribute, sort)
• Filter rows
• Denormalize
• Preaggregate
Optimization based on access patterns
11 © Hortonworks Inc. 2011–2018. All rights reserved.
Materialized Views to Rescue
 Speed up aggregates and joins via MVs
 View navigation via CBO/Calcite
 Optionally allow rewrites against out-of-date
materializations
12 © Hortonworks Inc. 2011–2018. All rights reserved.
Materialized Views in Hive 3
• Multiple storage options: Hive, Druid
• Multiple options to control materialized views lifecycle
13 © Hortonworks Inc. 2011–2018. All rights reserved.
Materialized View-based Rewriting
• Materialized view definition
CREATE MATERIALIZED VIEW mv AS
SELECT <dims>,
lo_revenue,
lo_extprice * lo_disc AS d_price,
lo_revenue - lo_supplycost,
FROM
customer, dates, lineorder, part, supplier
WHERE
lo_orderdate = d_datekey
and lo_partkey = p_partkey
and lo_suppkey = s_suppkey
and lo_custkey = c_custkey;
• Query
SELECT sum(lo_extendedprice*lo_discount)
FROM
lineorder, dates
WHERE
lo_orderdate = d_datekey
and d_year = 2013
and lo_discount between 1 and 3;
• Materialized view-based rewriting
SELECT SUM(d_price)
FROM mv
WHERE
d_year = 2013
and lo_discount between 1 and 3;
supplier
part
dates
customerlineorder
mv contents
Query results
14 © Hortonworks Inc. 2011–2018. All rights reserved.
Rebuilding Materialized Views
• ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD;
• Incremental materialized view maintenance
• Only refresh data that has changed in source tables
15 © Hortonworks Inc. 2011–2018. All rights reserved.
Accelerating Query Processing with
Materialized Views in Apache Hive
Jesus Camacho Rodriguez
Tuesday, June 19
2:50 PM - 3:30 PM
Executive Ballroom 210A/E
16 © Hortonworks Inc. 2011–2018. All rights reserved.
Workload Management
17 © Hortonworks Inc. 2011–2018. All rights reserved.
Overview
• Effectively share LLAP cluster resources
• Resource allocation per user policy; separate ETL and BI, etc.
• Resources based guardrails
• Protect against long running queries, high memory usage
• Improved, query-aware scheduling
• Scheduler is aware of query characteristics, types, etc.
• Fragments easy to pre-empt compared to containers
• Queries get guaranteed fractions of the cluster, but
can use empty space
18 © Hortonworks Inc. 2011–2018. All rights reserved.
Resource Plans
• Resource plan is a workload management configuration for a cluster
• Switching is allowed without stopping queries, e.g. based on time of day
• Cluster is divided into query pools (optionally nested)
• Each pool defines query parallelism, cluster resources percentage
• Queries are automatically routed to pools based on user name, app, etc.
• Rules (Triggers) to kill, move, or deprioritized queries based on DFS usage, runtime, etc.
• Example :
CREATE RESOURCE PLAN daytime;
CREATE POOL bi IN daytime (resource_percent=75, concurrent_queries=5);
CREATE POOL etl IN daytime TIME (resource_percent=25, concurrent_queries=10);
CREATE RULE downgrade IN daytime WHEN total_runtime > 120 THEN MOVE etl;
ADD RULE downgrade TO bi IN daytime ;
CREATE MAPPING tableau IN daytime (application='Tableau', pool=bi);
ALTER PLAN daytime SET default_pool='etl';
APPLY PLAN daytime;
19 © Hortonworks Inc. 2011–2018. All rights reserved.
Decentralized Guaranteed Resources
• A guaranteed task for each resource (executor slots)
• HS2 gives N guaranteed tasks to an AM based on configured resource plan
• AMs mark N of its most important tasks as guaranteed at any given time
• Guaranteed tasks pre-empt speculative tasks
20 © Hortonworks Inc. 2011–2018. All rights reserved.
Guaranteed Tasks – BI and ETL Example
BI (80% = 14 guaranteed) ETL (20% = 4 guaranteed)
Query 1 Query 2
LLAP Daemon 1 LLAP Daemon 2 LLAP Daemon 3
Wait Queue
Executors
10 active tasks (running):
10 guaranteed (running)
4 unused for now
19 active tasks (8 running):
4 guaranteed (4 running)
15 speculative (4 running)
HS2
18 executors total
21 © Hortonworks Inc. 2011–2018. All rights reserved.
Caching
22 © Hortonworks Inc. 2011–2018. All rights reserved.
Caching for BI Workloads
• Fine-grained (columnar), compact (dictionary, RLE encoded)
• Important due to projections over many wide EDW tables
• Prioritized – indexes are cached with higher priority
• Important to make use of predicate pushdown
• Off-heap (no extra GC), supports SSD
• LRFU replacement policy avoids the damage from large scans
23 © Hortonworks Inc. 2011–2018. All rights reserved.
Caching for BI Workloads – Formats, Zero-ETL
• ORC, Parquet
• Cached natively
• Zero-ETL analytics on CSV and JSON data with text caching
• Text is efficiently encoded in background; once cached, queries speed up
24 © Hortonworks Inc. 2011–2018. All rights reserved.
In-memory Processing – Native Columnar (ORC)
I/O threads
SSD
cache
Off-heap
cacheCompact encoded data
Distributed FS
Compressed data
Decoder: ORC
col1
col2
Compression
codec
Read planner
Execution thread
Fragment
Hive
operator
Hive
operator
Vectorized
processing
col1 col2
Native data
vectors
Replacement
policy
25 © Hortonworks Inc. 2011–2018. All rights reserved.
Running Hive queries fast in the cloud
Nita Dembla
Wednesday, June 20
4:00 PM - 4:40 PM
Grand Ballroom 220C
26 © Hortonworks Inc. 2011–2018. All rights reserved.
Druid + Apache Hive
Layer Data Access Pattern Features
Hive Layer Large Scale analytics
Joins
Subqueries
Windowing Functions
Transformations
Complex Aggregations
Advanced Sorting
UDFs
Druid Layer
Needles-in-a-haystack queries with
large numbers of dimensions
Dimensional Aggregates
Top N Queries
Min/Max Values
Timeseries Queries
Approximate Distinct Count
Approximate Histograms
27 © Hortonworks Inc. 2011–2018. All rights reserved.
Druid Integration
• Pushdown of aggregate queries
• Pushdown of complex expressions
• Improvements in Druid to support sql standard NULL semantics
• Store MV In Druid
28 © Hortonworks Inc. 2011–2018. All rights reserved.
Hive 3: Real-time Ingestion
Hive
Kafka-Druid-
Hive ingest
Druid
Real-time analytics
• Druid answers in near real-time
29 © Hortonworks Inc. 2011–2018. All rights reserved.
Druid and Hive Together: Interactive
Realtime Analytics at Scale
Nishant Bangarwa
Tuesday, June 19
4:50 PM - 5:30 PM
Grand Ballroom 220B
30 © Hortonworks Inc. 2011–2018. All rights reserved.
Acid V2
• New On disk storage format for Acid tables
• Run major compactions before you upgrade
• Update = Delete + Insert
• Performance at par with non-Acid tables
• Support for load statements
• New Streaming ingestion library
31 © Hortonworks Inc. 2011–2018. All rights reserved.
Insert-only Tables
• Transactional Semantics for non-ORC tables
• For insert into and Insert overwrite
• With near-zero overhead
• No rename() - Cloud friendly
32 © Hortonworks Inc. 2011–2018. All rights reserved.
Transactional Operations in Apache Hive
Eugene Koifman
Wednesday, June 20
11:50 AM - 12:30 PM
Executive Ballroom 210A/E
33 © Hortonworks Inc. 2011–2018. All rights reserved.
Disaster Recovery for Hive Data
A
A B
B
CentralizedSecurityandGovernance
On-Premise
Data Center (a)
On-Premise
Data Center (b)
Scheduled Policy (A)
(2am, 10am, 6pm daily)
Scheduled Policy (B)
(2am daily)
1 Data replication with scheduled policy
2 Disaster takes down Data Center (b)
3 Failover to Data Center (a); data set B made active
4 Active data set B changes to B’ in Data Center (a)
34 © Hortonworks Inc. 2011–2018. All rights reserved.
Hive-based Replication
• Replv2 introduces new REPL commands
• Incremental replication - only copy delta changes
• Point-in time replication.
• Hive maintains the replication state.
• Additional support for other database objects - for ex: functions, constraint etc.
• Reduce number of copies.
35 © Hortonworks Inc. 2011–2018. All rights reserved.
Seamless Replication and Disaster
Recovery for Apache Hive Warehouse
Sankar Hariappan
Thursday, June 21
9:30 AM - 10:10 AM
Meeting Room 211A/B/C/D
36 © Hortonworks Inc. 2011–2018. All rights reserved.
One Metastore to Rule Them All
HDFS/S3 Kafka
Hive
LLAP
Spark
HMS Atlas
RangerSR
Hive
on Tez
37 © Hortonworks Inc. 2011–2018. All rights reserved.
Between Us and the Grand Vision
• Make HMS separable from Hive
• Standalone Metastore
• Unify HMS and Schema Registry so batch and streaming can see each other’s data
• Also reduces the number of metadata systems admins have to install and maintain
38 © Hortonworks Inc. 2011–2018. All rights reserved.
Sharing Metadata Across the Data Lake
and Streams
Alan Gates
Wednesday, June 20
11:50 AM - 12:30 PM
Meeting Room 230A
39 © Hortonworks Inc. 2011–2018. All rights reserved.
External Access –
Spark Llap
40 © Hortonworks Inc. 2011–2018. All rights reserved.
External Access – Relational View for Everyone
• Hive-on-Tez and other DAG executors can use LLAP directly
• LLAP also provides a "relational datanode" view of the data
• Anyone (with access) can push the (approved) code in, from complex query fragments to
simple data reads
• E.g. a Spark DataFrame can be created with LlapInputFormat
• Gives the external services the access to
• Hive data: centralized, secure data access
• Ability to read all Hive table types, like ACID transactional tables
• Hive features: from column-level security, to LLAP columnar cache
41 © Hortonworks Inc. 2011–2018. All rights reserved.
Support Row/Column-level Security in Spark
spark-shell
pyspark
42 © Hortonworks Inc. 2011–2018. All rights reserved.
What Is Required?
• Apache Ranger
• Apache Hive with LLAP
• Spark-LLAP
• A library to integrate above tech with SparkSQL
43 © Hortonworks Inc. 2011–2018. All rights reserved.
HiveServer2 + LLAP + Ranger
YARN Cluster
HiveServer2
Client App
Hive Query
Coordinator
Plan Generation
TableScan: users
Filter: state = ‘CA’
Projection:
mask(name)
SQL Query:
select name from users
1.Client sends query to HiveServer2.
2.Query plan generation by HiveServer2. Ranger
security policies applied. Plan modified based on
dynamic security policies.
3.Query plan sent to query coordinator
4.Query plan sent to LLAP daemons for execution.
Filtering/masking performed.
5.Results consolidated and sent to client
1 Ranger
Dynamic Policies
5 2
3 4
LLAP
LLAP
LLAP Daemons
44 © Hortonworks Inc. 2011–2018. All rights reserved.
LLAP
InputFor
mat
YARN Cluster
HiveServer2
Client App
Hive Query
Coordinator
Plan Generation
TableScan: users
Filter: state = ‘CA’
Projection:
mask(name)
SQL Query:
select name from users
1.Client requests data locations known as “splits”
from HiveServer2.
2.Query plan generation by HiveServer2. Ranger
security policies applied. Plan modified based on
dynamic security policies.
3.Splits returned to client which include signed
query plan.
4.LLAP splits used by client to securely submit
query plan to LLAP. Filtering/masking performed.
Data returned to client.
1 Ranger
Dynamic Policies
3 2
LLAP
LLAP
LLAP Daemons
HiveServer2 + LLAP + Ranger
4
45 © Hortonworks Inc. 2011–2018. All rights reserved.
“Other” Improvements
• Query reoptimization
• Constraints
• Vectorization
• Query Cache
• Active Passive HS2 HA for llap
• HLL BitVectors
• CachedStore
• Numerous enhancements in Spark Integration
46 © Hortonworks Inc. 2011–2018. All rights reserved.
Future
• Standalone Metastore
• Materialized Views – Automatic Recommendations
• Better integration with cloud storage
• HS2 scalability
47 © Hortonworks Inc. 2011–2018. All rights reserved.
Thanks
to Open Source Community
for continued success for last
10 years.
Now,
Onwards to next 10 years

Contenu connexe

Tendances

Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in ImpalaCloudera, Inc.
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenDatabricks
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopHortonworks
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Apache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryApache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryBryan Bende
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Introduction and HDInsight best practices
Introduction and HDInsight best practicesIntroduction and HDInsight best practices
Introduction and HDInsight best practicesAshish Thapliyal
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFTDataWorks Summit
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesDataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 

Tendances (20)

Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Apache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryApache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi Registry
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Introduction and HDInsight best practices
Introduction and HDInsight best practicesIntroduction and HDInsight best practices
Introduction and HDInsight best practices
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Nifi
NifiNifi
Nifi
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 

Similaire à What's new in apache hive

Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storySunil Govindan
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善HortonworksJapan
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?DataWorks Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019alanfgates
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_featuresAlberto Romero
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionDataWorks Summit
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesTimothy Spann
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019alanfgates
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData DayJohn Park
 

Similaire à What's new in apache hive (20)

Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Dernier (20)

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

What's new in apache hive

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved. What is new in Apache Hive? Ashutosh Chauhan
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Hive – Distant Past – First Five Years • Initial use case: batch processing • Circa 2008 • Read-only data • MapReduce • HiveQL
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Hive – Past 5 Years • Effort to take Hive beyond its batch processing roots • Started in Apache Hive 0.10.0 (January 2013) • Latest released version: Apache Hive 3.0 (May 2018) • Extensive renovation along four different axes • Runtime : Enable sub-second queries - LLAP • Compiler : Cost Based Optimizer • SQL support : Improved coverage of SQL syntax • Transactional Support : ACID
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved. Hive – Today • Comprehensive ANSI SQL including all TPC-DS Queries. • The only Hadoop SQL with ACID MERGE for easy updates. • In-Memory caching for MPP performance at Hadoop scale. • Enables Per-User dynamic row and column security. • Enables Replication and DR for critical workloads. • Compatible with every major BI Tool. • Proven at 300+ PB Scale.
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Hive: Fast Facts Most Queries Per Hour 100,000 Queries Per Hour Analytics Performance 100 Million rows/s Per Node Largest Hive Warehouse 300+ PB Raw Storage Largest Cluster 4,500+ Nodes
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved. Hive: Serving ETL Workloads to BI Systems BI systems Materialized view Improved Stats Constraints Query Result Cache Workload manage ment ACID v2 • Results return from HDFS/cache directly • Reduce load from repetitive queries • Allows more queries to be run in parallel • Reduce resource starvation in large clusters • Also: Active/Passive HA • More “tools” for optimizer to use • More ”tools” for DBAs to tune/optimize • Invisible tuning of DB from users’ perspective • ACID v2 is as fast as regular tables
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved. • SIGMOD Software Systems Award • “For developing seminal software systems that served to bring relational-style declarative programming to the Hadoop ecosystem.” • Postgres, SQLLite and MonetDB
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved. Hive – How Did We Get Here? • LLAP Enhancements • CBO Enhancements • ACID Enhancements
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved. Materialized Views in Hive
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved. Accelerating Query Processing • Change data physical properties (distribute, sort) • Filter rows • Denormalize • Preaggregate Optimization based on access patterns
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved. Materialized Views to Rescue  Speed up aggregates and joins via MVs  View navigation via CBO/Calcite  Optionally allow rewrites against out-of-date materializations
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved. Materialized Views in Hive 3 • Multiple storage options: Hive, Druid • Multiple options to control materialized views lifecycle
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved. Materialized View-based Rewriting • Materialized view definition CREATE MATERIALIZED VIEW mv AS SELECT <dims>, lo_revenue, lo_extprice * lo_disc AS d_price, lo_revenue - lo_supplycost, FROM customer, dates, lineorder, part, supplier WHERE lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and lo_custkey = c_custkey; • Query SELECT sum(lo_extendedprice*lo_discount) FROM lineorder, dates WHERE lo_orderdate = d_datekey and d_year = 2013 and lo_discount between 1 and 3; • Materialized view-based rewriting SELECT SUM(d_price) FROM mv WHERE d_year = 2013 and lo_discount between 1 and 3; supplier part dates customerlineorder mv contents Query results
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved. Rebuilding Materialized Views • ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD; • Incremental materialized view maintenance • Only refresh data that has changed in source tables
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved. Accelerating Query Processing with Materialized Views in Apache Hive Jesus Camacho Rodriguez Tuesday, June 19 2:50 PM - 3:30 PM Executive Ballroom 210A/E
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved. Workload Management
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved. Overview • Effectively share LLAP cluster resources • Resource allocation per user policy; separate ETL and BI, etc. • Resources based guardrails • Protect against long running queries, high memory usage • Improved, query-aware scheduling • Scheduler is aware of query characteristics, types, etc. • Fragments easy to pre-empt compared to containers • Queries get guaranteed fractions of the cluster, but can use empty space
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved. Resource Plans • Resource plan is a workload management configuration for a cluster • Switching is allowed without stopping queries, e.g. based on time of day • Cluster is divided into query pools (optionally nested) • Each pool defines query parallelism, cluster resources percentage • Queries are automatically routed to pools based on user name, app, etc. • Rules (Triggers) to kill, move, or deprioritized queries based on DFS usage, runtime, etc. • Example : CREATE RESOURCE PLAN daytime; CREATE POOL bi IN daytime (resource_percent=75, concurrent_queries=5); CREATE POOL etl IN daytime TIME (resource_percent=25, concurrent_queries=10); CREATE RULE downgrade IN daytime WHEN total_runtime > 120 THEN MOVE etl; ADD RULE downgrade TO bi IN daytime ; CREATE MAPPING tableau IN daytime (application='Tableau', pool=bi); ALTER PLAN daytime SET default_pool='etl'; APPLY PLAN daytime;
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved. Decentralized Guaranteed Resources • A guaranteed task for each resource (executor slots) • HS2 gives N guaranteed tasks to an AM based on configured resource plan • AMs mark N of its most important tasks as guaranteed at any given time • Guaranteed tasks pre-empt speculative tasks
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved. Guaranteed Tasks – BI and ETL Example BI (80% = 14 guaranteed) ETL (20% = 4 guaranteed) Query 1 Query 2 LLAP Daemon 1 LLAP Daemon 2 LLAP Daemon 3 Wait Queue Executors 10 active tasks (running): 10 guaranteed (running) 4 unused for now 19 active tasks (8 running): 4 guaranteed (4 running) 15 speculative (4 running) HS2 18 executors total
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved. Caching
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved. Caching for BI Workloads • Fine-grained (columnar), compact (dictionary, RLE encoded) • Important due to projections over many wide EDW tables • Prioritized – indexes are cached with higher priority • Important to make use of predicate pushdown • Off-heap (no extra GC), supports SSD • LRFU replacement policy avoids the damage from large scans
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved. Caching for BI Workloads – Formats, Zero-ETL • ORC, Parquet • Cached natively • Zero-ETL analytics on CSV and JSON data with text caching • Text is efficiently encoded in background; once cached, queries speed up
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved. In-memory Processing – Native Columnar (ORC) I/O threads SSD cache Off-heap cacheCompact encoded data Distributed FS Compressed data Decoder: ORC col1 col2 Compression codec Read planner Execution thread Fragment Hive operator Hive operator Vectorized processing col1 col2 Native data vectors Replacement policy
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved. Running Hive queries fast in the cloud Nita Dembla Wednesday, June 20 4:00 PM - 4:40 PM Grand Ballroom 220C
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved. Druid + Apache Hive Layer Data Access Pattern Features Hive Layer Large Scale analytics Joins Subqueries Windowing Functions Transformations Complex Aggregations Advanced Sorting UDFs Druid Layer Needles-in-a-haystack queries with large numbers of dimensions Dimensional Aggregates Top N Queries Min/Max Values Timeseries Queries Approximate Distinct Count Approximate Histograms
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved. Druid Integration • Pushdown of aggregate queries • Pushdown of complex expressions • Improvements in Druid to support sql standard NULL semantics • Store MV In Druid
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved. Hive 3: Real-time Ingestion Hive Kafka-Druid- Hive ingest Druid Real-time analytics • Druid answers in near real-time
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved. Druid and Hive Together: Interactive Realtime Analytics at Scale Nishant Bangarwa Tuesday, June 19 4:50 PM - 5:30 PM Grand Ballroom 220B
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved. Acid V2 • New On disk storage format for Acid tables • Run major compactions before you upgrade • Update = Delete + Insert • Performance at par with non-Acid tables • Support for load statements • New Streaming ingestion library
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved. Insert-only Tables • Transactional Semantics for non-ORC tables • For insert into and Insert overwrite • With near-zero overhead • No rename() - Cloud friendly
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved. Transactional Operations in Apache Hive Eugene Koifman Wednesday, June 20 11:50 AM - 12:30 PM Executive Ballroom 210A/E
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved. Disaster Recovery for Hive Data A A B B CentralizedSecurityandGovernance On-Premise Data Center (a) On-Premise Data Center (b) Scheduled Policy (A) (2am, 10am, 6pm daily) Scheduled Policy (B) (2am daily) 1 Data replication with scheduled policy 2 Disaster takes down Data Center (b) 3 Failover to Data Center (a); data set B made active 4 Active data set B changes to B’ in Data Center (a)
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved. Hive-based Replication • Replv2 introduces new REPL commands • Incremental replication - only copy delta changes • Point-in time replication. • Hive maintains the replication state. • Additional support for other database objects - for ex: functions, constraint etc. • Reduce number of copies.
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved. Seamless Replication and Disaster Recovery for Apache Hive Warehouse Sankar Hariappan Thursday, June 21 9:30 AM - 10:10 AM Meeting Room 211A/B/C/D
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved. One Metastore to Rule Them All HDFS/S3 Kafka Hive LLAP Spark HMS Atlas RangerSR Hive on Tez
  • 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved. Between Us and the Grand Vision • Make HMS separable from Hive • Standalone Metastore • Unify HMS and Schema Registry so batch and streaming can see each other’s data • Also reduces the number of metadata systems admins have to install and maintain
  • 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved. Sharing Metadata Across the Data Lake and Streams Alan Gates Wednesday, June 20 11:50 AM - 12:30 PM Meeting Room 230A
  • 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved. External Access – Spark Llap
  • 40. 40 © Hortonworks Inc. 2011–2018. All rights reserved. External Access – Relational View for Everyone • Hive-on-Tez and other DAG executors can use LLAP directly • LLAP also provides a "relational datanode" view of the data • Anyone (with access) can push the (approved) code in, from complex query fragments to simple data reads • E.g. a Spark DataFrame can be created with LlapInputFormat • Gives the external services the access to • Hive data: centralized, secure data access • Ability to read all Hive table types, like ACID transactional tables • Hive features: from column-level security, to LLAP columnar cache
  • 41. 41 © Hortonworks Inc. 2011–2018. All rights reserved. Support Row/Column-level Security in Spark spark-shell pyspark
  • 42. 42 © Hortonworks Inc. 2011–2018. All rights reserved. What Is Required? • Apache Ranger • Apache Hive with LLAP • Spark-LLAP • A library to integrate above tech with SparkSQL
  • 43. 43 © Hortonworks Inc. 2011–2018. All rights reserved. HiveServer2 + LLAP + Ranger YARN Cluster HiveServer2 Client App Hive Query Coordinator Plan Generation TableScan: users Filter: state = ‘CA’ Projection: mask(name) SQL Query: select name from users 1.Client sends query to HiveServer2. 2.Query plan generation by HiveServer2. Ranger security policies applied. Plan modified based on dynamic security policies. 3.Query plan sent to query coordinator 4.Query plan sent to LLAP daemons for execution. Filtering/masking performed. 5.Results consolidated and sent to client 1 Ranger Dynamic Policies 5 2 3 4 LLAP LLAP LLAP Daemons
  • 44. 44 © Hortonworks Inc. 2011–2018. All rights reserved. LLAP InputFor mat YARN Cluster HiveServer2 Client App Hive Query Coordinator Plan Generation TableScan: users Filter: state = ‘CA’ Projection: mask(name) SQL Query: select name from users 1.Client requests data locations known as “splits” from HiveServer2. 2.Query plan generation by HiveServer2. Ranger security policies applied. Plan modified based on dynamic security policies. 3.Splits returned to client which include signed query plan. 4.LLAP splits used by client to securely submit query plan to LLAP. Filtering/masking performed. Data returned to client. 1 Ranger Dynamic Policies 3 2 LLAP LLAP LLAP Daemons HiveServer2 + LLAP + Ranger 4
  • 45. 45 © Hortonworks Inc. 2011–2018. All rights reserved. “Other” Improvements • Query reoptimization • Constraints • Vectorization • Query Cache • Active Passive HS2 HA for llap • HLL BitVectors • CachedStore • Numerous enhancements in Spark Integration
  • 46. 46 © Hortonworks Inc. 2011–2018. All rights reserved. Future • Standalone Metastore • Materialized Views – Automatic Recommendations • Better integration with cloud storage • HS2 scalability
  • 47. 47 © Hortonworks Inc. 2011–2018. All rights reserved. Thanks to Open Source Community for continued success for last 10 years. Now, Onwards to next 10 years