SlideShare une entreprise Scribd logo
1  sur  43
1© Cloudera, Inc. All rights reserved.
Todd Lipcon on behalf of the Kudu team
Kudu: Resolving Transactional
and Analytic Trade-offs in
Hadoop
1
2© Cloudera, Inc. All rights reserved.
The conference for and by Data Scientists, from startup to enterprise
wrangleconf.com
Public registration is now open!
Who: Featuring data scientists from Salesforce,
Uber, Pinterest, and more
When: Thursday, October 22, 2015
Where: Broadway Studios, San Francisco
3© Cloudera, Inc. All rights reserved.
Kudu
Storage for Fast Analytics on Fast Data
• New updating column store for
Hadoop
• Apache-licensed open source
• Beta now available
Columnar Store
Kudu
4© Cloudera, Inc. All rights reserved.
Motivation and Goals
Why build Kudu?
4
5© Cloudera, Inc. All rights reserved.
Motivating Questions
• Are there user problems that can we can’t address because of gaps in Hadoop
ecosystem storage technologies?
• Are we positioned to take advantage of advancements in the hardware
landscape?
6© Cloudera, Inc. All rights reserved.
Current Storage Landscape in Hadoop
HDFS excels at:
• Efficiently scanning large amounts
of data
• Accumulating data with high
throughput
HBase excels at:
• Efficiently finding and writing
individual rows
• Making data mutable
Gaps exist when these properties
are needed simultaneously
7© Cloudera, Inc. All rights reserved.
• High throughput for big scans (columnar
storage and replication)
Goal: Within 2x of Parquet
• Low-latency for short accesses (primary key
indexes and quorum replication)
Goal: 1ms read/write on SSD
• Database-like semantics (initially single-row
ACID)
• Relational data model
• SQL query
• “NoSQL” style scan/insert/update (Java client)
Kudu Design Goals
8© Cloudera, Inc. All rights reserved.
Changing Hardware landscape
• Spinning disk -> solid state storage
• NAND flash: Up to 450k read 250k write iops, about 2GB/sec read and
1.5GB/sec write throughput, at a price of less than $3/GB and dropping
• 3D XPoint memory (1000x faster than NAND, cheaper than RAM)
• RAM is cheaper and more abundant:
• 64->128->256GB over last few years
• Takeaway 1: The next bottleneck is CPU, and current storage systems weren’t
designed with CPU efficiency in mind.
• Takeaway 2: Column stores are feasible for random access
9© Cloudera, Inc. All rights reserved.
Kudu Usage
• Table has a SQL-like schema
• Finite number of columns (unlike HBase/Cassandra)
• Types: BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING, BINARY,
TIMESTAMP
• Some subset of columns makes up a possibly-composite primary key
• Fast ALTER TABLE
• Java and C++ “NoSQL” style APIs
• Insert(), Update(), Delete(), Scan()
• Integrations with MapReduce, Spark, and Impala
• more to come!
9
10© Cloudera, Inc. All rights reserved.
Use cases and architectures
11© Cloudera, Inc. All rights reserved.
Kudu Use Cases
Kudu is best for use cases requiring a simultaneous combination of
sequential and random reads and writes
● Time Series
○ Examples: Stream market data; fraud detection & prevention; risk monitoring
○ Workload: Insert, updates, scans, lookups
● Machine Data Analytics
○ Examples: Network threat detection
○ Workload: Inserts, scans, lookups
● Online Reporting
○ Examples: ODS
○ Workload: Inserts, updates, scans, lookups
12© Cloudera, Inc. All rights reserved.
Real-Time Analytics in Hadoop Today
Fraud Detection in the Real World = Storage Complexity
Considerations:
● How do I handle failure
during this process?
● How often do I reorganize
data streaming in into a
format appropriate for
reporting?
● When reporting, how do I see
data that has not yet been
reorganized?
● How do I ensure that
important jobs aren’t
interrupted by maintenance?
New Partition
Most Recent Partition
Historic Data
HBase
Parquet
File
Have we
accumulated
enough data?
Reorganize
HBase file
into Parquet
• Wait for running operations to complete
• Define new Impala partition referencing
the newly written Parquet file
Incoming Data
(Messaging
System)
Reporting
Request
Impala on HDFS
13© Cloudera, Inc. All rights reserved.
Real-Time Analytics in Hadoop with Kudu
Improvements:
● One system to operate
● No cron jobs or background
processes
● Handle late arrivals or data
corrections with ease
● New data available
immediately for analytics or
operations
Historical and Real-time
Data
Incoming Data
(Messaging
System)
Reporting
Request
Storage in Kudu
14© Cloudera, Inc. All rights reserved.
How it works
14
15© Cloudera, Inc. All rights reserved.
Tables and Tablets
• Table is horizontally partitioned into tablets
• Range or hash partitioning
• PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY
HASH(timestamp) INTO 100 BUCKETS
• Each tablet has N replicas (3 or 5), with Raft consensus
• Allow read from any replica, plus leader-driven writes with low MTTR
• Tablet servers host tablets
• Store data on local disks (no HDFS)
15
16© Cloudera, Inc. All rights reserved.
Metadata
• Replicated master*
• Acts as a tablet directory (“META” table)
• Acts as a catalog (table schemas, etc)
• Acts as a load balancer (tracks TS liveness, re-replicates under-replicated
tablets)
• Caches all metadata in RAM for high performance
• 80-node load test, GetTableLocations RPC perf:
• 99th percentile: 68us, 99.99th percentile: 657us
• <2% peak CPU usage
• Client configured with master addresses
• Asks master for tablet locations as needed and caches them
16
17© Cloudera, Inc. All rights reserved.
18© Cloudera, Inc. All rights reserved.
Raft consensus
18
TS A
Tablet 1
(LEADER)
Client
TS B
Tablet 1
(FOLLOWER)
TS C
Tablet 1
(FOLLOWER)
WAL
WALWAL
2b. Leader writes local WAL
1a. Client->Leader: Write() RPC
2a. Leader->Followers:
UpdateConsensus() RPC
3. Follower: write WAL
4. Follower->Leader: success
3. Follower: write WAL
5. Leader has achieved majority
6. Leader->Client: Success!
19© Cloudera, Inc. All rights reserved.
Fault tolerance
• Transient FOLLOWER failure:
• Leader can still achieve majority
• Restart follower TS within 5 min and it will rejoin transparently
• Transient LEADER failure:
• Followers expect to hear a heartbeat from their leader every 1.5 seconds
• 3 missed heartbeats: leader election!
• New LEADER is elected from remaining nodes within a few seconds
• Restart within 5 min and it rejoins as a FOLLOWER
• N replicas handle (N-1)/2 failures
19
20© Cloudera, Inc. All rights reserved.
Fault tolerance (2)
• Permanent failure:
• Leader notices that a follower has been dead for 5 minutes
• Evicts that follower
• Master selects a new replica
• Leader copies the data over to the new one, which joins as a new FOLLOWER
20
21© Cloudera, Inc. All rights reserved.
Tablet design
• Inserts buffered in an in-memory store (like HBase’s memstore)
• Flushed to disk
• Columnar layout, similar to Apache Parquet
• Updates use MVCC (updates tagged with timestamp, not in-place)
• Allow “SELECT AS OF <timestamp>” queries and consistent cross-tablet scans
• Near-optimal read path for “current time” scans
• No per row branches, fast vectorized decoding and predicate evaluation
• Performance worsens based on number of recent updates
21
22© Cloudera, Inc. All rights reserved.
LSM vs Kudu
• LSM – Log Structured Merge (Cassandra, HBase, etc)
• Inserts and updates all go to an in-memory map (MemStore) and later flush to
on-disk files (HFile/SSTable)
• Reads perform an on-the-fly merge of all on-disk HFiles
• Kudu
• Shares some traits (memstores, compactions)
• More complex.
• Slower writes in exchange for faster reads (especially scans)
• During tonight’s break-out sessions, I can go into excruciating detail
22
23© Cloudera, Inc. All rights reserved.
Kudu trade-offs
• Random updates will be slower
• HBase model allows random updates without incurring a disk seek
• Kudu requires a key lookup before update, bloom lookup before insert, may
incur seeks
• Single-row reads may be slower
• Columnar design is optimized for scans
• Especially slow at reading a row that has had many recent updates (e.g YCSB
“zipfian”)
23
24© Cloudera, Inc. All rights reserved.
Benchmarks
24
25© Cloudera, Inc. All rights reserved.
TPC-H (Analytics benchmark)
• 75TS + 1 master cluster
• 12 (spinning) disk each, enough RAM to fit dataset
• Using Kudu 0.5.0, Impala 2.2 with Kudu support, CDH 5.4
• TPC-H Scale Factor 100 (100GB)
• Example query:
• SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer,
orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND
l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey
AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA'
AND o_orderdate >= date '1994-01-01' AND o_orderdate < '1995-01-01’ GROUP BY
n_name ORDER BY revenue desc;
25
26© Cloudera, Inc. All rights reserved.
- Kudu outperforms Parquet by 31% (geometric mean) for RAM-resident data
- Parquet likely to outperform Kudu for HDD-resident (larger IO requests)
27© Cloudera, Inc. All rights reserved.
What about Apache Phoenix?
• 10 node cluster (9 worker, 1 master)
• HBase 1.0, Phoenix 4.3
• TPC-H LINEITEM table only (6B rows)
27
2152
219
76
131
0.04
1918
13.2
1.7
0.7
0.15
155
9.3
1.4 1.5 1.37
0.01
0.1
1
10
100
1000
10000
Load TPCH Q1 COUNT(*)
COUNT(*)
WHERE…
single-row
lookup
Time(sec)
Phoenix
Kudu
Parquet
28© Cloudera, Inc. All rights reserved.
What about NoSQL-style random access? (YCSB)
• YCSB 0.5.0-snapshot
• 10 node cluster
(9 worker, 1 master)
• HBase 1.0
• 100M rows, 10M ops
28
29© Cloudera, Inc. All rights reserved.
But don’t trust me (a vendor)…
29
30© Cloudera, Inc. All rights reserved.
About Xiaomi
Mobile Internet Company Founded in 2010
Smartphones Software
E-commerce
MIUI
Cloud Services
App Store/Game
Payment/Finance
…
Smart Home
Smart Devices
31© Cloudera, Inc. All rights reserved.
Big Data Analytics Pipeline
Before Kudu
• Long pipeline
high latency(1 hour ~ 1 day), data conversion pains
• No ordering
Log arrival(storage) order not exactly logical order
e.g. read 2-3 days of log for data in 1 day
32© Cloudera, Inc. All rights reserved.
Big Data Analysis Pipeline
Simplified With Kudu
• ETL Pipeline(0~10s latency)
Apps that need to prevent backpressure or require ETL
• Direct Pipeline(no latency)
Apps that don’t require ETL and no backpressure issues
OLAP scan
Side table lookup
Result store
33© Cloudera, Inc. All rights reserved.
Use Case 1
Mobile service monitoring and tracing tool
Requirements
 High write throughput
>5 Billion records/day and growing
 Query latest data and quick response
Identify and resolve issues quickly
 Can search for individual records
Easy for troubleshooting
Gather important RPC tracing events from mobile
app and backend service.
Service monitoring & troubleshooting tool.
34© Cloudera, Inc. All rights reserved.
Use Case 1: Benchmark
Environment
 71 Node cluster
 Hardware
CPU: E5-2620 2.1GHz * 24 core Memory: 64GB
Network: 1Gb Disk: 12 HDD
 Software
Hadoop2.6/Impala 2.1/Kudu
Data
 1 day of server side tracing data
~2.6 Billion rows
~270 bytes/row
17 columns, 5 key columns
35© Cloudera, Inc. All rights reserved.
Use Case 1: Benchmark Results
1.4 2.0 2.3
3.1
1.3 0.91.3
2.8
4.0
5.7
7.5
16.7
Q1 Q2 Q3 Q4 Q5 Q6
kudu
parquet
Total Time(s) Throughput(Total) Throughput(per node)
Kudu 961.1 2.8M record/s 39.5k record/s
Parquet 114.6 23.5M record/s 331k records/s
Bulk load using impala (INSERT INTO):
Query latency:
* HDFS parquet file replication = 3 , kudu table replication = 3
* Each query run 5 times then take average
36© Cloudera, Inc. All rights reserved.
Use Case 1: Result Analysis
 Lazy materialization
Ideal for search style query
Q6 returns only a few records (of a single user) with all columns
 Scan range pruning using primary index
Predicates on primary key
Q5 only scans 1 hour of data
 Future work
Primary index: speed-up order by and distinct
Hash Partitioning: speed-up count(distinct), no need for global
shuffle/merge
37© Cloudera, Inc. All rights reserved.
Use Case 2
OLAP PaaS for ecosystem cloud
 Provide big data service for smart hardware startups (Xiaomi’s
ecosystem members)
 OLAP database with some OLTP features
 Manage/Ingest/query your data and serving results in one place
Backend/Mobile App/Smart Device/IoT …
38© Cloudera, Inc. All rights reserved.
What Kudu is not
38
39© Cloudera, Inc. All rights reserved.
Kudu is…
• NOT a SQL database
• “BYO SQL”
• NOT a filesystem
• data must have tabular structure
• NOT a replacement for HBase or HDFS
• Cloudera continues to invest in those systems
• Many use cases where they’re still more appropriate
• NOT an in-memory database
• Very fast for memory-sized workloads, but can operate on larger data too!
39
40© Cloudera, Inc. All rights reserved.
Getting started
40
41© Cloudera, Inc. All rights reserved.
Getting started as a user
• http://getkudu.io
• kudu-user@googlegroups.com
• Quickstart VM
• Easiest way to get started
• Impala and Kudu in an easy-to-install VM
• CSD and Parcels
• For installation on a Cloudera Manager-managed cluster
41
42© Cloudera, Inc. All rights reserved.
Getting started as a developer
• http://github.com/cloudera/kudu
• All commits go here first
• Public gerrit: http://gerrit.cloudera.org
• All code reviews happening here
• Public JIRA: http://issues.cloudera.org
• Includes bugs going back to 2013. Come see our dirty laundry!
• kudu-dev@googlegroups.com
• Apache 2.0 license open source
• Contributions are welcome and encouraged!
42
43© Cloudera, Inc. All rights reserved.
http://getkudu.io/
@getkudu

Contenu connexe

Tendances

cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data PlatformRakuten Group, Inc.
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...Yahoo Developer Network
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Data Con LA
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Dataconomy Media
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyondMatija Gobec
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupCaserta
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Apache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerHBaseCon
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...DataStax
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduJeremy Beard
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...Yahoo Developer Network
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 

Tendances (20)

cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
 
Introducing Kudu
Introducing KuduIntroducing Kudu
Introducing Kudu
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Kudu demo
Kudu demoKudu demo
Kudu demo
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Apache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at Cerner
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 

En vedette

Mastan talent acquisition (1)
Mastan talent acquisition (1)Mastan talent acquisition (1)
Mastan talent acquisition (1)Mastanvali Shaik
 
Life isn't fair...
Life isn't fair...Life isn't fair...
Life isn't fair...Susan Smith
 
La música
La músicaLa música
La músicamerymill
 
Gastcollege Vrije Universiteit Brussel
Gastcollege Vrije Universiteit BrusselGastcollege Vrije Universiteit Brussel
Gastcollege Vrije Universiteit BrusselCarlo Vuijlsteke
 
Jak rozwiązać problem w godzinę metodą Design Studio?
Jak rozwiązać problem w godzinę metodą Design Studio?Jak rozwiązać problem w godzinę metodą Design Studio?
Jak rozwiązać problem w godzinę metodą Design Studio?Joanna Ostafin
 
Bangladesh vs carlisle flooding facts
Bangladesh vs carlisle flooding facts Bangladesh vs carlisle flooding facts
Bangladesh vs carlisle flooding facts Nishay Patel
 
Promoting creativity in the classroom ppt presentation
Promoting creativity in the classroom ppt presentationPromoting creativity in the classroom ppt presentation
Promoting creativity in the classroom ppt presentationChelsea Bibeau
 
Types of coastal protection
Types of coastal protectionTypes of coastal protection
Types of coastal protectionNishay Patel
 
Modulo matematica eba
Modulo matematica ebaModulo matematica eba
Modulo matematica ebaIrfa Peru
 
Síndrome Mielodisplasico
Síndrome Mielodisplasico Síndrome Mielodisplasico
Síndrome Mielodisplasico José Leonis
 
Guy Barrette: Nouveau portail Azure et la console Kudu
Guy Barrette: Nouveau portail Azure et la console KuduGuy Barrette: Nouveau portail Azure et la console Kudu
Guy Barrette: Nouveau portail Azure et la console KuduMSDEVMTL
 

En vedette (18)

Desempleo en la comarca lagunera
Desempleo en la comarca laguneraDesempleo en la comarca lagunera
Desempleo en la comarca lagunera
 
D. KARTHICK RESUME_02.12.2015
D. KARTHICK RESUME_02.12.2015D. KARTHICK RESUME_02.12.2015
D. KARTHICK RESUME_02.12.2015
 
Mastan talent acquisition (1)
Mastan talent acquisition (1)Mastan talent acquisition (1)
Mastan talent acquisition (1)
 
Life isn't fair...
Life isn't fair...Life isn't fair...
Life isn't fair...
 
Market research
Market researchMarket research
Market research
 
Agile Software development
Agile Software developmentAgile Software development
Agile Software development
 
I-Creative
I-CreativeI-Creative
I-Creative
 
La música
La músicaLa música
La música
 
Gastcollege Vrije Universiteit Brussel
Gastcollege Vrije Universiteit BrusselGastcollege Vrije Universiteit Brussel
Gastcollege Vrije Universiteit Brussel
 
Jak rozwiązać problem w godzinę metodą Design Studio?
Jak rozwiązać problem w godzinę metodą Design Studio?Jak rozwiązać problem w godzinę metodą Design Studio?
Jak rozwiązać problem w godzinę metodą Design Studio?
 
Sheryl sandberg
Sheryl sandbergSheryl sandberg
Sheryl sandberg
 
Bangladesh vs carlisle flooding facts
Bangladesh vs carlisle flooding facts Bangladesh vs carlisle flooding facts
Bangladesh vs carlisle flooding facts
 
Promoting creativity in the classroom ppt presentation
Promoting creativity in the classroom ppt presentationPromoting creativity in the classroom ppt presentation
Promoting creativity in the classroom ppt presentation
 
Types of coastal protection
Types of coastal protectionTypes of coastal protection
Types of coastal protection
 
Modulo matematica eba
Modulo matematica ebaModulo matematica eba
Modulo matematica eba
 
Síndrome Mielodisplasico
Síndrome Mielodisplasico Síndrome Mielodisplasico
Síndrome Mielodisplasico
 
Guy Barrette: Nouveau portail Azure et la console Kudu
Guy Barrette: Nouveau portail Azure et la console KuduGuy Barrette: Nouveau portail Azure et la console Kudu
Guy Barrette: Nouveau portail Azure et la console Kudu
 
STRESS AND COPING STYLE OF URBAN AND RURAL ADOLESCENTS
STRESS AND COPING STYLE OF URBAN AND RURAL ADOLESCENTSSTRESS AND COPING STYLE OF URBAN AND RURAL ADOLESCENTS
STRESS AND COPING STYLE OF URBAN AND RURAL ADOLESCENTS
 

Similaire à SFHUG Kudu Talk

Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataCloudera, Inc.
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Hadoop / Spark Conference Japan
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechApache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechCloudera Japan
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxManish Maheshwari
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaManish Maheshwari
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...Cloudera, Inc.
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Wei-Chiu Chuang
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisFelicia Haggarty
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 

Similaire à SFHUG Kudu Talk (20)

Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
 
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechApache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentech
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptx
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
 
Fast Analytics
Fast Analytics Fast Analytics
Fast Analytics
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 

Plus de Felicia Haggarty

8 Tips for Deploying DevSecOps
8 Tips for Deploying DevSecOps8 Tips for Deploying DevSecOps
8 Tips for Deploying DevSecOpsFelicia Haggarty
 
Yarn presentation - DFW CUG - December 2015
Yarn presentation - DFW CUG - December 2015Yarn presentation - DFW CUG - December 2015
Yarn presentation - DFW CUG - December 2015Felicia Haggarty
 
Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Felicia Haggarty
 
Data revolution by Doug Cutting
Data revolution by Doug CuttingData revolution by Doug Cutting
Data revolution by Doug CuttingFelicia Haggarty
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 

Plus de Felicia Haggarty (6)

8 Tips for Deploying DevSecOps
8 Tips for Deploying DevSecOps8 Tips for Deploying DevSecOps
8 Tips for Deploying DevSecOps
 
Yarn presentation - DFW CUG - December 2015
Yarn presentation - DFW CUG - December 2015Yarn presentation - DFW CUG - December 2015
Yarn presentation - DFW CUG - December 2015
 
Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015
 
IoT Austin CUG talk
IoT Austin CUG talkIoT Austin CUG talk
IoT Austin CUG talk
 
Data revolution by Doug Cutting
Data revolution by Doug CuttingData revolution by Doug Cutting
Data revolution by Doug Cutting
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Dernier (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

SFHUG Kudu Talk

  • 1. 1© Cloudera, Inc. All rights reserved. Todd Lipcon on behalf of the Kudu team Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop 1
  • 2. 2© Cloudera, Inc. All rights reserved. The conference for and by Data Scientists, from startup to enterprise wrangleconf.com Public registration is now open! Who: Featuring data scientists from Salesforce, Uber, Pinterest, and more When: Thursday, October 22, 2015 Where: Broadway Studios, San Francisco
  • 3. 3© Cloudera, Inc. All rights reserved. Kudu Storage for Fast Analytics on Fast Data • New updating column store for Hadoop • Apache-licensed open source • Beta now available Columnar Store Kudu
  • 4. 4© Cloudera, Inc. All rights reserved. Motivation and Goals Why build Kudu? 4
  • 5. 5© Cloudera, Inc. All rights reserved. Motivating Questions • Are there user problems that can we can’t address because of gaps in Hadoop ecosystem storage technologies? • Are we positioned to take advantage of advancements in the hardware landscape?
  • 6. 6© Cloudera, Inc. All rights reserved. Current Storage Landscape in Hadoop HDFS excels at: • Efficiently scanning large amounts of data • Accumulating data with high throughput HBase excels at: • Efficiently finding and writing individual rows • Making data mutable Gaps exist when these properties are needed simultaneously
  • 7. 7© Cloudera, Inc. All rights reserved. • High throughput for big scans (columnar storage and replication) Goal: Within 2x of Parquet • Low-latency for short accesses (primary key indexes and quorum replication) Goal: 1ms read/write on SSD • Database-like semantics (initially single-row ACID) • Relational data model • SQL query • “NoSQL” style scan/insert/update (Java client) Kudu Design Goals
  • 8. 8© Cloudera, Inc. All rights reserved. Changing Hardware landscape • Spinning disk -> solid state storage • NAND flash: Up to 450k read 250k write iops, about 2GB/sec read and 1.5GB/sec write throughput, at a price of less than $3/GB and dropping • 3D XPoint memory (1000x faster than NAND, cheaper than RAM) • RAM is cheaper and more abundant: • 64->128->256GB over last few years • Takeaway 1: The next bottleneck is CPU, and current storage systems weren’t designed with CPU efficiency in mind. • Takeaway 2: Column stores are feasible for random access
  • 9. 9© Cloudera, Inc. All rights reserved. Kudu Usage • Table has a SQL-like schema • Finite number of columns (unlike HBase/Cassandra) • Types: BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP • Some subset of columns makes up a possibly-composite primary key • Fast ALTER TABLE • Java and C++ “NoSQL” style APIs • Insert(), Update(), Delete(), Scan() • Integrations with MapReduce, Spark, and Impala • more to come! 9
  • 10. 10© Cloudera, Inc. All rights reserved. Use cases and architectures
  • 11. 11© Cloudera, Inc. All rights reserved. Kudu Use Cases Kudu is best for use cases requiring a simultaneous combination of sequential and random reads and writes ● Time Series ○ Examples: Stream market data; fraud detection & prevention; risk monitoring ○ Workload: Insert, updates, scans, lookups ● Machine Data Analytics ○ Examples: Network threat detection ○ Workload: Inserts, scans, lookups ● Online Reporting ○ Examples: ODS ○ Workload: Inserts, updates, scans, lookups
  • 12. 12© Cloudera, Inc. All rights reserved. Real-Time Analytics in Hadoop Today Fraud Detection in the Real World = Storage Complexity Considerations: ● How do I handle failure during this process? ● How often do I reorganize data streaming in into a format appropriate for reporting? ● When reporting, how do I see data that has not yet been reorganized? ● How do I ensure that important jobs aren’t interrupted by maintenance? New Partition Most Recent Partition Historic Data HBase Parquet File Have we accumulated enough data? Reorganize HBase file into Parquet • Wait for running operations to complete • Define new Impala partition referencing the newly written Parquet file Incoming Data (Messaging System) Reporting Request Impala on HDFS
  • 13. 13© Cloudera, Inc. All rights reserved. Real-Time Analytics in Hadoop with Kudu Improvements: ● One system to operate ● No cron jobs or background processes ● Handle late arrivals or data corrections with ease ● New data available immediately for analytics or operations Historical and Real-time Data Incoming Data (Messaging System) Reporting Request Storage in Kudu
  • 14. 14© Cloudera, Inc. All rights reserved. How it works 14
  • 15. 15© Cloudera, Inc. All rights reserved. Tables and Tablets • Table is horizontally partitioned into tablets • Range or hash partitioning • PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY HASH(timestamp) INTO 100 BUCKETS • Each tablet has N replicas (3 or 5), with Raft consensus • Allow read from any replica, plus leader-driven writes with low MTTR • Tablet servers host tablets • Store data on local disks (no HDFS) 15
  • 16. 16© Cloudera, Inc. All rights reserved. Metadata • Replicated master* • Acts as a tablet directory (“META” table) • Acts as a catalog (table schemas, etc) • Acts as a load balancer (tracks TS liveness, re-replicates under-replicated tablets) • Caches all metadata in RAM for high performance • 80-node load test, GetTableLocations RPC perf: • 99th percentile: 68us, 99.99th percentile: 657us • <2% peak CPU usage • Client configured with master addresses • Asks master for tablet locations as needed and caches them 16
  • 17. 17© Cloudera, Inc. All rights reserved.
  • 18. 18© Cloudera, Inc. All rights reserved. Raft consensus 18 TS A Tablet 1 (LEADER) Client TS B Tablet 1 (FOLLOWER) TS C Tablet 1 (FOLLOWER) WAL WALWAL 2b. Leader writes local WAL 1a. Client->Leader: Write() RPC 2a. Leader->Followers: UpdateConsensus() RPC 3. Follower: write WAL 4. Follower->Leader: success 3. Follower: write WAL 5. Leader has achieved majority 6. Leader->Client: Success!
  • 19. 19© Cloudera, Inc. All rights reserved. Fault tolerance • Transient FOLLOWER failure: • Leader can still achieve majority • Restart follower TS within 5 min and it will rejoin transparently • Transient LEADER failure: • Followers expect to hear a heartbeat from their leader every 1.5 seconds • 3 missed heartbeats: leader election! • New LEADER is elected from remaining nodes within a few seconds • Restart within 5 min and it rejoins as a FOLLOWER • N replicas handle (N-1)/2 failures 19
  • 20. 20© Cloudera, Inc. All rights reserved. Fault tolerance (2) • Permanent failure: • Leader notices that a follower has been dead for 5 minutes • Evicts that follower • Master selects a new replica • Leader copies the data over to the new one, which joins as a new FOLLOWER 20
  • 21. 21© Cloudera, Inc. All rights reserved. Tablet design • Inserts buffered in an in-memory store (like HBase’s memstore) • Flushed to disk • Columnar layout, similar to Apache Parquet • Updates use MVCC (updates tagged with timestamp, not in-place) • Allow “SELECT AS OF <timestamp>” queries and consistent cross-tablet scans • Near-optimal read path for “current time” scans • No per row branches, fast vectorized decoding and predicate evaluation • Performance worsens based on number of recent updates 21
  • 22. 22© Cloudera, Inc. All rights reserved. LSM vs Kudu • LSM – Log Structured Merge (Cassandra, HBase, etc) • Inserts and updates all go to an in-memory map (MemStore) and later flush to on-disk files (HFile/SSTable) • Reads perform an on-the-fly merge of all on-disk HFiles • Kudu • Shares some traits (memstores, compactions) • More complex. • Slower writes in exchange for faster reads (especially scans) • During tonight’s break-out sessions, I can go into excruciating detail 22
  • 23. 23© Cloudera, Inc. All rights reserved. Kudu trade-offs • Random updates will be slower • HBase model allows random updates without incurring a disk seek • Kudu requires a key lookup before update, bloom lookup before insert, may incur seeks • Single-row reads may be slower • Columnar design is optimized for scans • Especially slow at reading a row that has had many recent updates (e.g YCSB “zipfian”) 23
  • 24. 24© Cloudera, Inc. All rights reserved. Benchmarks 24
  • 25. 25© Cloudera, Inc. All rights reserved. TPC-H (Analytics benchmark) • 75TS + 1 master cluster • 12 (spinning) disk each, enough RAM to fit dataset • Using Kudu 0.5.0, Impala 2.2 with Kudu support, CDH 5.4 • TPC-H Scale Factor 100 (100GB) • Example query: • SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate >= date '1994-01-01' AND o_orderdate < '1995-01-01’ GROUP BY n_name ORDER BY revenue desc; 25
  • 26. 26© Cloudera, Inc. All rights reserved. - Kudu outperforms Parquet by 31% (geometric mean) for RAM-resident data - Parquet likely to outperform Kudu for HDD-resident (larger IO requests)
  • 27. 27© Cloudera, Inc. All rights reserved. What about Apache Phoenix? • 10 node cluster (9 worker, 1 master) • HBase 1.0, Phoenix 4.3 • TPC-H LINEITEM table only (6B rows) 27 2152 219 76 131 0.04 1918 13.2 1.7 0.7 0.15 155 9.3 1.4 1.5 1.37 0.01 0.1 1 10 100 1000 10000 Load TPCH Q1 COUNT(*) COUNT(*) WHERE… single-row lookup Time(sec) Phoenix Kudu Parquet
  • 28. 28© Cloudera, Inc. All rights reserved. What about NoSQL-style random access? (YCSB) • YCSB 0.5.0-snapshot • 10 node cluster (9 worker, 1 master) • HBase 1.0 • 100M rows, 10M ops 28
  • 29. 29© Cloudera, Inc. All rights reserved. But don’t trust me (a vendor)… 29
  • 30. 30© Cloudera, Inc. All rights reserved. About Xiaomi Mobile Internet Company Founded in 2010 Smartphones Software E-commerce MIUI Cloud Services App Store/Game Payment/Finance … Smart Home Smart Devices
  • 31. 31© Cloudera, Inc. All rights reserved. Big Data Analytics Pipeline Before Kudu • Long pipeline high latency(1 hour ~ 1 day), data conversion pains • No ordering Log arrival(storage) order not exactly logical order e.g. read 2-3 days of log for data in 1 day
  • 32. 32© Cloudera, Inc. All rights reserved. Big Data Analysis Pipeline Simplified With Kudu • ETL Pipeline(0~10s latency) Apps that need to prevent backpressure or require ETL • Direct Pipeline(no latency) Apps that don’t require ETL and no backpressure issues OLAP scan Side table lookup Result store
  • 33. 33© Cloudera, Inc. All rights reserved. Use Case 1 Mobile service monitoring and tracing tool Requirements  High write throughput >5 Billion records/day and growing  Query latest data and quick response Identify and resolve issues quickly  Can search for individual records Easy for troubleshooting Gather important RPC tracing events from mobile app and backend service. Service monitoring & troubleshooting tool.
  • 34. 34© Cloudera, Inc. All rights reserved. Use Case 1: Benchmark Environment  71 Node cluster  Hardware CPU: E5-2620 2.1GHz * 24 core Memory: 64GB Network: 1Gb Disk: 12 HDD  Software Hadoop2.6/Impala 2.1/Kudu Data  1 day of server side tracing data ~2.6 Billion rows ~270 bytes/row 17 columns, 5 key columns
  • 35. 35© Cloudera, Inc. All rights reserved. Use Case 1: Benchmark Results 1.4 2.0 2.3 3.1 1.3 0.91.3 2.8 4.0 5.7 7.5 16.7 Q1 Q2 Q3 Q4 Q5 Q6 kudu parquet Total Time(s) Throughput(Total) Throughput(per node) Kudu 961.1 2.8M record/s 39.5k record/s Parquet 114.6 23.5M record/s 331k records/s Bulk load using impala (INSERT INTO): Query latency: * HDFS parquet file replication = 3 , kudu table replication = 3 * Each query run 5 times then take average
  • 36. 36© Cloudera, Inc. All rights reserved. Use Case 1: Result Analysis  Lazy materialization Ideal for search style query Q6 returns only a few records (of a single user) with all columns  Scan range pruning using primary index Predicates on primary key Q5 only scans 1 hour of data  Future work Primary index: speed-up order by and distinct Hash Partitioning: speed-up count(distinct), no need for global shuffle/merge
  • 37. 37© Cloudera, Inc. All rights reserved. Use Case 2 OLAP PaaS for ecosystem cloud  Provide big data service for smart hardware startups (Xiaomi’s ecosystem members)  OLAP database with some OLTP features  Manage/Ingest/query your data and serving results in one place Backend/Mobile App/Smart Device/IoT …
  • 38. 38© Cloudera, Inc. All rights reserved. What Kudu is not 38
  • 39. 39© Cloudera, Inc. All rights reserved. Kudu is… • NOT a SQL database • “BYO SQL” • NOT a filesystem • data must have tabular structure • NOT a replacement for HBase or HDFS • Cloudera continues to invest in those systems • Many use cases where they’re still more appropriate • NOT an in-memory database • Very fast for memory-sized workloads, but can operate on larger data too! 39
  • 40. 40© Cloudera, Inc. All rights reserved. Getting started 40
  • 41. 41© Cloudera, Inc. All rights reserved. Getting started as a user • http://getkudu.io • kudu-user@googlegroups.com • Quickstart VM • Easiest way to get started • Impala and Kudu in an easy-to-install VM • CSD and Parcels • For installation on a Cloudera Manager-managed cluster 41
  • 42. 42© Cloudera, Inc. All rights reserved. Getting started as a developer • http://github.com/cloudera/kudu • All commits go here first • Public gerrit: http://gerrit.cloudera.org • All code reviews happening here • Public JIRA: http://issues.cloudera.org • Includes bugs going back to 2013. Come see our dirty laundry! • kudu-dev@googlegroups.com • Apache 2.0 license open source • Contributions are welcome and encouraged! 42
  • 43. 43© Cloudera, Inc. All rights reserved. http://getkudu.io/ @getkudu