SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
1 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hive 3.0, A New Horizon
Alan Gates
Hortonworks Co-founder, Apache Hive PMC member
@alanfgates
2 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hive – Data Warehousing for Big Data
• Comprehensive ANSI SQL
• Only open source Hadoop SQL with transactions, INSERT/UPDATE/DELETE/MERGE
• BI queries with MPP performance at big data scales
• ETL jobs scale with your cluster
• Enables per-user dynamic row and column security
• Enables replication for HA and DR
• Compatible with every major BI tool
• Proven at 300+ PB scale
3 © Hortonworks Inc. 2011–2018. All rights reserved
Hive on Tez
Deep
Storage
Hadoop Cluster
Tez Container
Query
Executors
Tez Container
Query
Executors
Tez Container
Query
Executors
Tez Container
Query
Executors
Tez AM
Tez AM
HiveServer2
(Query
Endpoint)
ODBC /
JDBC
SQL
Queries
HDFS and
Compatible
S3 WASB Isilon
4 © Hortonworks Inc. 2011–2018. All rights reserved
Hive LLAP - MPP Performance at Hadoop Scale
Deep
Storage
Hadoop Cluster
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
Query
Coordinators
Coord-
inator
Coord-
inator
Coord-
inator
HiveServer2
(Query
Endpoint)
ODBC /
JDBC
SQL
Queries In-Memory Cache
(Shared Across All Users)
HDFS and
Compatible
S3 WASB Isilon
5 © Hortonworks Inc. 2011–2018. All rights reserved
Hive3: EDW Analyst Pipeline
BI tools
Materialized
view
Surrogate
key
Constraints
Query
Result
Cache
Workload
management
• Results return
from HDFS/cache
directly
• Reduce load from
repetitive queries
• Allows more
queries to be run
in parallel
• Reduce resource
starvation in large
clusters
• Active/Passive HA
• More “tools” for
optimizer to use
• More ”tools” for
DBAs to
tune/optimize
• Invisible tuning of
DB from users’
perspective
• ACID v2 is as fast as
regular tables
• Hive 3 is optimized
for S3/WASB/GCP
• Support for
JDBC/Kafka/Druid
out of the box
ACID v2
Cloud
Storage
Connectors
6 © Hortonworks Inc. 2011–2018. All rights reserved
New SQL Features
7 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Read and Write
• Originally Hive supported write only by adding partitions or loading new files into
existing partitions
• Starting in version 0.13, Hive added transactions and INSERT, UPDATE, DELETE
• Supports
• Slow changing dimensions
• Correcting mis-loaded data
• GDPR's right to be forgotten
• Not OLTP!
• Drawbacks:
• Transactional tables had to be stored in ORC and had to be bucketed
• Reading transactional tables was significantly slower than non-transactional
• No support for MERGE or UPSERT functionality
8 © Hortonworks Inc. 2011–2018. All rights reserved
ACID v2
• In 3.0 ACID storage has been reworked
• Performance penalty for ACID is now negligible even when compactor has not run
• With other optimizations ACID can result in speed up (more on this in the performance talk)
• Added MERGE support
• CDC can be regularly merged into a fact table with upsert functionality
• Removed restrictions:
• Tables no longer have to be bucketed
• Non-ORC based tables supported (INSERT & SELECT only)
• Still not OLTP!
9 © Hortonworks Inc. 2011–2018. All rights reserved
Constraints & Defaults
• Helps optimizer to produce better plans
• BI tool integrations
• Data Integrity
• hive.constraint.notnull.enforce = true
• SQL compatibility & offload scenarios
Example:
CREATE TABLE Persons (
ID Int NOT NULL,
Name String NOT NULL,
Age Int,
Creator String DEFAULT CURRENT_USER(),
CreateDate Date DEFAULT CURRENT_DATE(),
PRIMARY KEY (ID) DISABLE NOVALIDATE
);
CREATE TABLE BusinessUnit (
ID Int NOT NULL,
Head Int NOT NULL,
Creator String DEFAULT CURRENT_USER(),
CreateDate Date DEFAULT CURRENT_DATE(),
PRIMARY KEY (ID) DISABLE NOVALIDATE,
CONSTRAINT fk FOREIGN KEY (Head)
REFERENCES Persons(ID) DISABLE
NOVALIDATE
);
10 © Hortonworks Inc. 2011–2018. All rights reserved
Hive Native Replication
• REPL commands added to support replication
• Replication currently done at database level (all tables etc. in the db)
• Copies data together with metadata
• Master/slave on db level
• When first setup, existing data copied
• Then incremental replication - only copies changes
• Hive itself provides primitives for replication, not active daemons
• Used by Hortonworks Data Lifecycle Manager to provide High Availability and Disaster Recovery
• Replication can be between two clusters or between cluster and cloud
11 © Hortonworks Inc. 2011–2018. All rights reserved
Plus More
• Materialized Views with refresh (more in the performance talk)
• Surrogate keys – default values, unique, not monotonically increasing
• SQL Standard Information Schema now supported
• Ranger can now enforce authorization policies for use of global non-builtin UDFs
• Support for TIMESTAMP WITH TIMEZONE data type
• In Hive 3 much work has been done to optimize Hive for object stores
• Hive uses its ACID system to determine which files to read rather than trust the storage
• Moves eliminated where ever possible
• More aggressive caching of file metadata and data to reduce file system operations
• Apache Parquet and text files now supported in LLAP
12 © Hortonworks Inc. 2011–2018. All rights reserved
Workload Management
13 © Hortonworks Inc. 2011–2018. All rights reserved
LLAP Workload Management
• Effectively share LLAP cluster resources
• Resource allocation per user policy; separate ETL and BI, etc.
• Resources based guardrails
• Protect against long running queries, high memory usage
• Improved, query-aware scheduling
• Scheduler is aware of query characteristics, types, etc.
• Fragments easy to pre-empt compared to containers
• Queries get guaranteed fractions of the cluster, but can use empty space
14 © Hortonworks Inc. 2011–2018. All rights reserved
Guardrail Example
Common Triggers
● ELAPSED_TIME
● EXECUTION_TIME
● TOTAL_TASKS
● HDFS_BYTES_READ, HDFS_BYTES_WRITTEN
● CREATED FILES
● CREATED_DYNAMIC_PARTITIONS
Example
CREATE RESOURCE PLAN guardrail;
CREATE TRIGGER guardrail.long_running WHEN EXECUTION_TIME > 2000 DO KILL;
ALTER TRIGGER guardrail.long_running ADD TO UNMANAGED;
ALTER RESOURCE PLAN guardrail ENABLE ACTIVATE;
15 © Hortonworks Inc. 2011–2018. All rights reserved
Resource Plans Example
CREATE RESOURCE PLAN daytime;
CREATE POOL daytime.bi WITH ALLOC_FRACTION=0.8, QUERY_PARALLELISM=5;
CREATE POOL daytime.etl WITH ALLOC_FRACTION=0.2, QUERY_PARALLELISM=20;
CREATE RULE downgrade IN daytime WHEN total_runtime > 3000 THEN MOVE etl;
ADD RULE downgrade TO bi;
CREATE APPLICATION MAPPING tableau in daytime TO bi;
ALTER PLAN daytime SET default pool= etl;
APPLY PLAN daytime;
daytime
bi: 80% etl: 20%
Downgrade when total_runtime>3000
16 © Hortonworks Inc. 2011–2018. All rights reserved
Connectors
17 © Hortonworks Inc. 2011–2018. All rights reserved
EDW Ingestion Pipeline
LLAP
interface
Kafka-Druid-
Hive ingest
Kafka-hive
streaming
ingest
Druid
ACID tables
Real-time analytics
• Druid answers in near real-time
• JDBC sources
• Kafka sources
Easy to use
• Query any data via LLAP
• No need to de-ACID tables
• No bucketing required
• Calcite talks SQL
• Materialization just works
• Cache just works
JDBC sources
MySQL, Postgres, Oracle
18 © Hortonworks Inc. 2011–2018. All rights reserved
Kafka Connector
Connect
● You say Stream I say Table!
● Define Time based View
Over the Stream
(e.g. last 15 mins)
● Enforce Hive authentication
& authorization out of the
box
Analyze
● Kafka metadata (key, partition,
offset, timestamp) as first
class columns.
● Time Traveling based on time
predicate.
● Seek to offset or partition
based on filter predicate.
Transform
● Join stream to stream
OR stream to table
● ACID offload data from
Kafka to Hive Exactly
once.
● Produce Data and write
it back to Kafka.
19 © Hortonworks Inc. 2011–2018. All rights reserved
Driver
MetaStore
HiveServer+Tez
LLAP DaemonsExecutors
Spark
Meta
Hive
Meta
HWC (JDBC)
Executors LLAP Daemons
1
2
3
1. Driver submits query to HiveServer
2. Compile query and return ”splits” to Driver
3. Execute query on LLAP
c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show()
ACID
Tables
20 © Hortonworks Inc. 2011–2018. All rights reserved
Driver HiveServer+Tez
LLAP DaemonsExecutors HWC (Arrow)
Executors LLAP Daemons
4
5
4. Executor Tasks run for each split
5. Tasks reads Arrow data from LLAP
6. HWC returns ArrowColumnVectors to Spark
6
c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show()
ACID
Tables
MetaStore
Spark
Meta
Hive
Meta
21 © Hortonworks Inc. 2011–2018. All rights reserved
JDBC Connector
• How did we build the information_schema?
• We mapped the metastore into Hive’s table
space!
• Uses Hive-JDBC connector
• Read-only for now
• Supports automatic pushdown of full
subqueries
• Cost-based optimizer decides part of query runs
in RDBMS versus Hive
• Joins, aggregates, filters, projections, etc
CREATE TABLE postgres_table (
id INT,
name varchar
);
CREATE EXTERNAL TABLE hive_table (
id INT,
name STRING
) STORED BY
'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "POSTGRES",
"hive.sql.jdbc.driver"="org.postgresql.Driver",
"hive.sql.jdbc.url"="jdbc:postgresql://...",
"hive.sql.dbcp.username"="jdbctest",
"hive.sql.dbcp.password"="",
"hive.sql.query"="select * from postgres_table",
"hive.sql.column.mapping" = "id=ID, name=NAME",
"hive.jdbc.update.on.duplicate" = "true"
);
In Postgres
In Hive
22 © Hortonworks Inc. 2011–2018. All rights reserved
Usability: Data Analytics Studio
23 © Hortonworks Inc. 2011–2018. All rights reserved
One of the Extensible DataPlane Services
⬢ DAS 1.0 available now for HDP 3.0!
⬢ Monthly release cadence
⬢ Replaces Hive & Tez Views
⬢ Separate install from stack
Hortonworks Data Analytics Studio
HORTONWORKS DATAPLANE SERVICE
DATA SOURCE INTEGRATION
DATA SERVICES CATALOG
…DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO
+OTHER
(partner)
SECURITY CONTROLS
CORE CAPABILITIES
MULTIPLE CLUSTERS AND SOURCES
MULTIHYBRID
*not yet available, coming soon
EXTENSIBLE SERVICES
IBM DSX*
DATA
ANALYTICS
STUDIO
24 © Hortonworks Inc. 2011–2018. All rights reserved.
Hortonworks confidential and proprietary information
SOLUTIONS: Pre-defined searches to quickly narrow
down problematic queries in a large cluster
25 © Hortonworks Inc. 2011–2018. All rights reserved.
Hortonworks confidential and proprietary information
SOLUTIONS: Full featured Auto-complete, results
direct download, quick-data preview and many other
quality-of-life improvements
26 © Hortonworks Inc. 2011–2018. All rights reserved.
Hortonworks confidential and proprietary information
SOLUTIONS: Heuristic recommendation engine
Fully self-serviced query and storage optimization
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SOLUTIONS: Data Analytics Studio gives database
heatmap, quickly discover and see what part of your
cluster is being utilized more
28 © Hortonworks Inc. 2011–2018. All rights reserved
Superset UI for Fast, Interactive Dashboards and Exploration
29 © Hortonworks Inc. 2011–2018. All rights reserved
Coming Soon
30 © Hortonworks Inc. 2011–2018. All rights reserved
⬢ Hive on Kubernetes solves:
– Hive/LLAP side install (to main cluster)
– Multiple versions of Hive
– Multiple warehouse & compute instances
– Dynamic configuration and secrets management
– Stateful and work preserving restarts (cache)
– Rolling restart for upgrades. Fast rollback to
previous good state.
Hive on Kubernetes (WIP)
Kubernetes Hosting Environments
AWS GCP
Data OS
CPU / MEMORY /
STORAGE
OPENSHIFTAZURE
CLOUD PROVIDERS
ON-
PREM/HYB
RID
DATA PLANE SERVICES
Cluster Lifecycle Manager Data Analytics Studio (DAS) Organizational Services
COMPUTE CLUSTER
SHARED SERVICES
Ranger
Atlas
Metastore
Tiller API Server
DAS Web Service
Query Coordinators
Query Executors
Registry
Blobstore
Indexe
r
RDBMS
Hive Server
Long-running kubernetes cluster
Inter-cluster communication Intra-cluster communication
Ingress Controller or
Load Balancer
Internal Service Endpoint for
ReplicaSet or StatefulSet
Ephemeral kubernetes cluster
31 © Hortonworks Inc. 2011–2018. All rights reserved
Questions?
32 © Hortonworks Inc. 2011–2018. All rights reserved
Thank You

Contenu connexe

Tendances

Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetupiwrigley
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderAccelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderDatabricks
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overviewDataArt
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...Dremio Corporation
 

Tendances (20)

Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderAccelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks Autoloader
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 

Similaire à Apache Hive 3.0: New SQL Features and Performance Enhancements

What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive DataWorks Summit
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019alanfgates
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizonArtem Ervits
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseDataWorks Summit
 
Seamless Replication and Disaster Recovery for Apache Hive Warehouse
Seamless Replication and Disaster Recovery for Apache Hive WarehouseSeamless Replication and Disaster Recovery for Apache Hive Warehouse
Seamless Replication and Disaster Recovery for Apache Hive WarehouseSankar H
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleHortonworks
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloudgluent.
 
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
Apache Hive: From MapReduce to Enterprise-grade Big Data WarehousingApache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousingc-bslim
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudDataWorks Summit/Hadoop Summit
 
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Amazon Web Services
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_featuresAlberto Romero
 

Similaire à Apache Hive 3.0: New SQL Features and Performance Enhancements (20)

What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive Warehouse
 
Seamless Replication and Disaster Recovery for Apache Hive Warehouse
Seamless Replication and Disaster Recovery for Apache Hive WarehouseSeamless Replication and Disaster Recovery for Apache Hive Warehouse
Seamless Replication and Disaster Recovery for Apache Hive Warehouse
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
 
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
Apache Hive: From MapReduce to Enterprise-grade Big Data WarehousingApache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Apache Hive 3.0: New SQL Features and Performance Enhancements

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Apache Hive 3.0, A New Horizon Alan Gates Hortonworks Co-founder, Apache Hive PMC member @alanfgates
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Apache Hive – Data Warehousing for Big Data • Comprehensive ANSI SQL • Only open source Hadoop SQL with transactions, INSERT/UPDATE/DELETE/MERGE • BI queries with MPP performance at big data scales • ETL jobs scale with your cluster • Enables per-user dynamic row and column security • Enables replication for HA and DR • Compatible with every major BI tool • Proven at 300+ PB scale
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Hive on Tez Deep Storage Hadoop Cluster Tez Container Query Executors Tez Container Query Executors Tez Container Query Executors Tez Container Query Executors Tez AM Tez AM HiveServer2 (Query Endpoint) ODBC / JDBC SQL Queries HDFS and Compatible S3 WASB Isilon
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved Hive LLAP - MPP Performance at Hadoop Scale Deep Storage Hadoop Cluster LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors Query Coordinators Coord- inator Coord- inator Coord- inator HiveServer2 (Query Endpoint) ODBC / JDBC SQL Queries In-Memory Cache (Shared Across All Users) HDFS and Compatible S3 WASB Isilon
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Hive3: EDW Analyst Pipeline BI tools Materialized view Surrogate key Constraints Query Result Cache Workload management • Results return from HDFS/cache directly • Reduce load from repetitive queries • Allows more queries to be run in parallel • Reduce resource starvation in large clusters • Active/Passive HA • More “tools” for optimizer to use • More ”tools” for DBAs to tune/optimize • Invisible tuning of DB from users’ perspective • ACID v2 is as fast as regular tables • Hive 3 is optimized for S3/WASB/GCP • Support for JDBC/Kafka/Druid out of the box ACID v2 Cloud Storage Connectors
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved New SQL Features
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Read and Write • Originally Hive supported write only by adding partitions or loading new files into existing partitions • Starting in version 0.13, Hive added transactions and INSERT, UPDATE, DELETE • Supports • Slow changing dimensions • Correcting mis-loaded data • GDPR's right to be forgotten • Not OLTP! • Drawbacks: • Transactional tables had to be stored in ORC and had to be bucketed • Reading transactional tables was significantly slower than non-transactional • No support for MERGE or UPSERT functionality
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved ACID v2 • In 3.0 ACID storage has been reworked • Performance penalty for ACID is now negligible even when compactor has not run • With other optimizations ACID can result in speed up (more on this in the performance talk) • Added MERGE support • CDC can be regularly merged into a fact table with upsert functionality • Removed restrictions: • Tables no longer have to be bucketed • Non-ORC based tables supported (INSERT & SELECT only) • Still not OLTP!
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Constraints & Defaults • Helps optimizer to produce better plans • BI tool integrations • Data Integrity • hive.constraint.notnull.enforce = true • SQL compatibility & offload scenarios Example: CREATE TABLE Persons ( ID Int NOT NULL, Name String NOT NULL, Age Int, Creator String DEFAULT CURRENT_USER(), CreateDate Date DEFAULT CURRENT_DATE(), PRIMARY KEY (ID) DISABLE NOVALIDATE ); CREATE TABLE BusinessUnit ( ID Int NOT NULL, Head Int NOT NULL, Creator String DEFAULT CURRENT_USER(), CreateDate Date DEFAULT CURRENT_DATE(), PRIMARY KEY (ID) DISABLE NOVALIDATE, CONSTRAINT fk FOREIGN KEY (Head) REFERENCES Persons(ID) DISABLE NOVALIDATE );
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Hive Native Replication • REPL commands added to support replication • Replication currently done at database level (all tables etc. in the db) • Copies data together with metadata • Master/slave on db level • When first setup, existing data copied • Then incremental replication - only copies changes • Hive itself provides primitives for replication, not active daemons • Used by Hortonworks Data Lifecycle Manager to provide High Availability and Disaster Recovery • Replication can be between two clusters or between cluster and cloud
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Plus More • Materialized Views with refresh (more in the performance talk) • Surrogate keys – default values, unique, not monotonically increasing • SQL Standard Information Schema now supported • Ranger can now enforce authorization policies for use of global non-builtin UDFs • Support for TIMESTAMP WITH TIMEZONE data type • In Hive 3 much work has been done to optimize Hive for object stores • Hive uses its ACID system to determine which files to read rather than trust the storage • Moves eliminated where ever possible • More aggressive caching of file metadata and data to reduce file system operations • Apache Parquet and text files now supported in LLAP
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved Workload Management
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved LLAP Workload Management • Effectively share LLAP cluster resources • Resource allocation per user policy; separate ETL and BI, etc. • Resources based guardrails • Protect against long running queries, high memory usage • Improved, query-aware scheduling • Scheduler is aware of query characteristics, types, etc. • Fragments easy to pre-empt compared to containers • Queries get guaranteed fractions of the cluster, but can use empty space
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Guardrail Example Common Triggers ● ELAPSED_TIME ● EXECUTION_TIME ● TOTAL_TASKS ● HDFS_BYTES_READ, HDFS_BYTES_WRITTEN ● CREATED FILES ● CREATED_DYNAMIC_PARTITIONS Example CREATE RESOURCE PLAN guardrail; CREATE TRIGGER guardrail.long_running WHEN EXECUTION_TIME > 2000 DO KILL; ALTER TRIGGER guardrail.long_running ADD TO UNMANAGED; ALTER RESOURCE PLAN guardrail ENABLE ACTIVATE;
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Resource Plans Example CREATE RESOURCE PLAN daytime; CREATE POOL daytime.bi WITH ALLOC_FRACTION=0.8, QUERY_PARALLELISM=5; CREATE POOL daytime.etl WITH ALLOC_FRACTION=0.2, QUERY_PARALLELISM=20; CREATE RULE downgrade IN daytime WHEN total_runtime > 3000 THEN MOVE etl; ADD RULE downgrade TO bi; CREATE APPLICATION MAPPING tableau in daytime TO bi; ALTER PLAN daytime SET default pool= etl; APPLY PLAN daytime; daytime bi: 80% etl: 20% Downgrade when total_runtime>3000
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved Connectors
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved EDW Ingestion Pipeline LLAP interface Kafka-Druid- Hive ingest Kafka-hive streaming ingest Druid ACID tables Real-time analytics • Druid answers in near real-time • JDBC sources • Kafka sources Easy to use • Query any data via LLAP • No need to de-ACID tables • No bucketing required • Calcite talks SQL • Materialization just works • Cache just works JDBC sources MySQL, Postgres, Oracle
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Kafka Connector Connect ● You say Stream I say Table! ● Define Time based View Over the Stream (e.g. last 15 mins) ● Enforce Hive authentication & authorization out of the box Analyze ● Kafka metadata (key, partition, offset, timestamp) as first class columns. ● Time Traveling based on time predicate. ● Seek to offset or partition based on filter predicate. Transform ● Join stream to stream OR stream to table ● ACID offload data from Kafka to Hive Exactly once. ● Produce Data and write it back to Kafka.
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta HWC (JDBC) Executors LLAP Daemons 1 2 3 1. Driver submits query to HiveServer 2. Compile query and return ”splits” to Driver 3. Execute query on LLAP c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show() ACID Tables
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Driver HiveServer+Tez LLAP DaemonsExecutors HWC (Arrow) Executors LLAP Daemons 4 5 4. Executor Tasks run for each split 5. Tasks reads Arrow data from LLAP 6. HWC returns ArrowColumnVectors to Spark 6 c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show() ACID Tables MetaStore Spark Meta Hive Meta
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved JDBC Connector • How did we build the information_schema? • We mapped the metastore into Hive’s table space! • Uses Hive-JDBC connector • Read-only for now • Supports automatic pushdown of full subqueries • Cost-based optimizer decides part of query runs in RDBMS versus Hive • Joins, aggregates, filters, projections, etc CREATE TABLE postgres_table ( id INT, name varchar ); CREATE EXTERNAL TABLE hive_table ( id INT, name STRING ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "POSTGRES", "hive.sql.jdbc.driver"="org.postgresql.Driver", "hive.sql.jdbc.url"="jdbc:postgresql://...", "hive.sql.dbcp.username"="jdbctest", "hive.sql.dbcp.password"="", "hive.sql.query"="select * from postgres_table", "hive.sql.column.mapping" = "id=ID, name=NAME", "hive.jdbc.update.on.duplicate" = "true" ); In Postgres In Hive
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved Usability: Data Analytics Studio
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved One of the Extensible DataPlane Services ⬢ DAS 1.0 available now for HDP 3.0! ⬢ Monthly release cadence ⬢ Replaces Hive & Tez Views ⬢ Separate install from stack Hortonworks Data Analytics Studio HORTONWORKS DATAPLANE SERVICE DATA SOURCE INTEGRATION DATA SERVICES CATALOG …DATA LIFECYCLE MANAGER DATA STEWARD STUDIO +OTHER (partner) SECURITY CONTROLS CORE CAPABILITIES MULTIPLE CLUSTERS AND SOURCES MULTIHYBRID *not yet available, coming soon EXTENSIBLE SERVICES IBM DSX* DATA ANALYTICS STUDIO
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved. Hortonworks confidential and proprietary information SOLUTIONS: Pre-defined searches to quickly narrow down problematic queries in a large cluster
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved. Hortonworks confidential and proprietary information SOLUTIONS: Full featured Auto-complete, results direct download, quick-data preview and many other quality-of-life improvements
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved. Hortonworks confidential and proprietary information SOLUTIONS: Heuristic recommendation engine Fully self-serviced query and storage optimization
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SOLUTIONS: Data Analytics Studio gives database heatmap, quickly discover and see what part of your cluster is being utilized more
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved Superset UI for Fast, Interactive Dashboards and Exploration
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved Coming Soon
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved ⬢ Hive on Kubernetes solves: – Hive/LLAP side install (to main cluster) – Multiple versions of Hive – Multiple warehouse & compute instances – Dynamic configuration and secrets management – Stateful and work preserving restarts (cache) – Rolling restart for upgrades. Fast rollback to previous good state. Hive on Kubernetes (WIP) Kubernetes Hosting Environments AWS GCP Data OS CPU / MEMORY / STORAGE OPENSHIFTAZURE CLOUD PROVIDERS ON- PREM/HYB RID DATA PLANE SERVICES Cluster Lifecycle Manager Data Analytics Studio (DAS) Organizational Services COMPUTE CLUSTER SHARED SERVICES Ranger Atlas Metastore Tiller API Server DAS Web Service Query Coordinators Query Executors Registry Blobstore Indexe r RDBMS Hive Server Long-running kubernetes cluster Inter-cluster communication Intra-cluster communication Ingress Controller or Load Balancer Internal Service Endpoint for ReplicaSet or StatefulSet Ephemeral kubernetes cluster
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved Questions?
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved Thank You