SlideShare une entreprise Scribd logo
1  sur  33
Omid: A Transactional Framework for HBase
Francisco Perez-Sorrosal
Ohad Shacham
Hadoop Summit SJ
June 29th, 2016
Outline
 Background
 Basic Concepts
 Use cases
 Architecture
 Transaction Management
 High Availability
 Performance
 Summary
Hadoop Summit SJ (June 29th 2016)2
 New Big data apps → new requirements:
● Low-latency
● Incremental data processing
● e.g. Percolator
 Multiple clients updating same data concurrently
● Problem: Conflicts/Inconsistencies may arise
● Solution: Transactional Access to Data
Background
Hadoop Summit SJ (June 29th 2016)3
 Transaction → Abstract UoW to manage data with certain
guarantees
● ACID
● Relational databases
 Big data → NoSQL datastores → Transactions in NoSQL
● Relaxed Guarantees:
○ e.g. Atomicity, Consistency
Background
● Hard to Scale
○ Data partition
○ Data replication
Hadoop Summit SJ (June 29th 2016)4
 Flexible
 Reliable
 High Performant
 Scalable
…OLTP framework that allows BigData apps to
execute ACID transactions on top of HBase
+ =
Consistency in
BigData Apps
Omid is a…
Hadoop Summit SJ (June 29th 2016)5
Why use Omid?
 Simplifies development of apps requiring consistency
● Multi-row/multi-table transactions on HBase
● Simple & well-known interface
 Good performance & reliability
 Lock-free
 Snapshot Isolation
 HBase is a blackbox
● No HBase code modification
● No changes on table schemas
 Used successfully at Yahoo
Hadoop Summit SJ (June 29th 2016)6
Snapshot Isolation
▪ Transaction T2 overlaps in time with T1 & T3, but spatially:
● T1 ∩ T2 = ∅
● T2 ∩ T3 = { R4 } Transactions T2 and T3 conflict
▪ Transaction T4 does not have conflicts
TxId
T1
T2
T3
T4
Time Overlap Spatial Overlap (WriteSet)
R1 R2 R3 R4
R3 R4
R2 R4
R1 R3
Hadoop Summit SJ (June 29th 2016)7
Sieve
Use Cases: Sieve @ Yahoo
HBase
Internet
Crawler Doc Proc Aggregation
Omid
Feeder
Real-Time
Index
Notifications
Transactional Data Flow
Hadoop Summit SJ (June 29th 2016)8
Hive Metastore Thrift Server
Use Cases:
HBase
HBaseStore
Omid
Hadoop Summit SJ (June 29th 2016)
ObjectStore
Relational
Database
9
Transactional App
Architectural Components
HBase
Omid Client
Transaction Status Oracle
(TSO)
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Keep track &
Validate TXs
Commit Table
Compactor
Commit data
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
Cells
Hadoop Summit SJ (June 29th 2016)10
Client APIs
▪ Transaction Manager → Create Transactional contexts
Transaction begin();
void commit(Transaction tx);
void rollback(Transaction tx);
▪ Transactional Tables (TTable) → Data access
Result get(Transaction tx, Get g);
void put(Transaction tx, Put p);
ResultScanner getScanner(Transaction tx, Scan s);
Hadoop Summit SJ (June 29th 2016)11
TX Management (Begin TX phase)
Omid Client TSO TO Table/SC CommitTable
Begin TX Get ST
ST=1
TX(ST=1)
R/W Ops for TX (ST=1)
App
Begin TX
R/W Ops (within TX context)
TX Context
R/W Results for TX with ST=1
Read Ops:
Get right results
for TX’s SnapshotWrite Ops:
Build Writeset
for TX
Hadoop Summit SJ (June 29th 2016)12
TX Management (Commit TX Phase)
Omid Client TSO TO Table/SC CommitTable
Commit TX (Writeset)
Get CT
CT=2
TX(CT=2)
App
Commit TX
Check Conflicts
of TX Writeset
in Conflict Map
Persist commit details (ST/CT) for TX
Hadoop Summit SJ (June 29th 2016)13
TX Management (Complete TX Phase)
Omid Client TSO TO Table/SC CommitTable
Update SC for TX (ST=1/CT=2)
App
Complete commit (Cleanup entry for TX with ST=1)
Result
Hadoop Summit SJ (June 29th 2016)14
Transactional App
High Availability
HBase
Omid Client
Transaction Status Oracle
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Commit Table
Compactor
Commit data
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
Cells
Single
point of
failure
Hadoop Summit SJ (June 29th 2016)15
Timestamp
Oracle
Transaction Status Oracle
Transactional App
High Availability
HBase
Omid Client
Transaction Status Oracle
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Commit Table
Compactor
Commit data
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
CellsRecovery
State
Primary
/
Backup
Hadoop Summit SJ (June 29th 2016)16
High Availability – Failing Scenario
Omid Client TSO P TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=1
TX(ST=1)
TX 1
TO
Data Store Commit Table
Write(k1, v1) (ST=1)
TX 1 Write(k1, v1)
(k1, v1, 1)
Hadoop Summit SJ (June 29th 2016)17
High Availability – Failing Scenario
Omid Client TSO P TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Write(k2, v2) (ST=1)
Write(k2, v2)
(k1, v1, 1)
(k2, v2, 1)
Commit TX 1{k1, k2}
Commit TX 1
Get CT
CT=2
Persist commit details for TX 1
Hadoop Summit SJ (June 29th 2016)18
High Availability – Failing Scenario
Omid Client TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=3
TX(ST=3)
TX 3
TO
Data Store Commit Table
Read(k1) (ST=3)
TX 3 Read(k1)
(k1, v1, 1)
(k1, v1, 1)
(k2, v2, 1)Hadoop Summit SJ (June 29th 2016)19
High Availability – Failing Scenario
Omid Client TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Return TX 1 CT
(k1, v1, 1)
! exist
! exist
Read(k2) (ST=3)
(k2, v2, 1)
TX 3 Read(k2)
(k2, v2, 1)
CT = 2
Return TX 1 CT
v2
(1, 2)Hadoop Summit SJ (June 29th 2016)20
Timestamp
Oracle
Transaction Status Oracle
Transactional App
High Availability
HBase
Omid Client
Transaction Status Oracle
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Commit Table
Compactor
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
CellsRecovery
State
Hadoop Summit SJ (June 29th 2016)21
High Availability – Solution
Omid Client TSO P TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=1
TX(ST=1,E=1)
TX 1, 1
TO
Data Store Commit Table
Write(k1, v1) (ST=1)
TX 1 Write(k1, v1)
(k1, v1, 1)
Hadoop Summit SJ (June 29th 2016)22
High Availability – Solution
Omid Client TSO P TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Write(k2, v2) (ST=1)
Write(k2, v2)
(k1, v1, 1)
(k2, v2, 1)
Commit TX 1{k1, k2}
Commit TX 1
Get CT
CT=2
Persist commit details for TX 1
Hadoop Summit SJ (June 29th 2016)23
High Availability – Solution
Omid Client TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=3
TX(ST=3,E=3)
TX 3,3
TO
Data Store Commit Table
Read(k1) (ST=3)
TX 3 Read(k1)
(k1, v1, 1)
(k1, v1, 1)
(k2, v2, 1)Hadoop Summit SJ (June 29th 2016)24
High Availability – Solution
Omid Client TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Return TX1 CT
(k1, v1, 1)
! exist
(k2, v2, 1)
Invalid
Try invalidate
(1, -, invalid)
! exist
Read(k2) (ST=3)
(k2, v2, 1)
TX 3 Read(k2)
Hadoop Summit SJ (June 29th 2016)25
High Availability – Solution
Omid Client TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Return TX 1 CT
(k1, v1, 1)
! exist
! exist
(k2, v2, 1)
(1, 2, invalid)Hadoop Summit SJ (June 29th 2016)26
High Availability
 No runtime overhead in mainstream execution
• Minor overhead after failover
 TSO uses regular writes
 Leases for leader election
• Lease status check before/after writing to Commit Table
Hadoop Summit SJ (June 29th 2016)27
Perf. Improvements: Read-Only Txs
Omid Client TSO/TO Table/SC
Begin TX
TX(ST=1)
Read Ops for TX (ST=1)
App
Begin TX
Read Ops (in TX context)
TX Context
Read Results in Snapshot
Commit TX
Writeset is ∅, so no need to contact TSO!!!Success
Hadoop Summit SJ (June 29th 2016)28
TSO
HBase
Perf. Improvements: Commit Table Writes
Omid
Client
HBase
TSO
Commit
Table
Commit
Data
Omid
Client
Commit
Data
Hadoop Summit SJ (June 29th 2016)29
HBase
TSO
Perf. Improvements: Commit Table Writes
Omid
Client
HBase
TSO
Commit
Table
Commit
Data
Omid
Client
Commit
Data
Hadoop Summit SJ (June 29th 2016)30
0
50
100
150
200
250
300
350
400
1 2 4 6
Tps*103
Commit Table: # Region servers
Omid Throughput with Improvements
Hadoop Summit SJ (June 29th 2016)31
Summary
 Transactions in NoSQL
• Use cases in incremental big data processing
• Snapshot Isolation: Scalable consistency model
 Omid
• Web-scale TPS for HBase
• Reliable and performant
• Battle-tested
• http://omid.incubator.apache.org/
Hadoop Summit SJ (June 29th 2016)32
Questions?
Hadoop Summit SJ (June 29th 2016)33

Contenu connexe

Tendances

IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamDataWorks Summit/Hadoop Summit
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemGyula Fóra
 
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...InfluxData
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with SparkVincent GALOPIN
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesSingleStore
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteDatabricks
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem DataWorks Summit/Hadoop Summit
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkDatabricks
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/HudiVinoth Chandar
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingBellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingSantosh Sahoo
 
Realtime Reporting using Spark Streaming
Realtime Reporting using Spark StreamingRealtime Reporting using Spark Streaming
Realtime Reporting using Spark StreamingSantosh Sahoo
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDatabricks
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 

Tendances (20)

Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier Leaute
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingBellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark Streaming
 
Realtime Reporting using Spark Streaming
Realtime Reporting using Spark StreamingRealtime Reporting using Spark Streaming
Realtime Reporting using Spark Streaming
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 

En vedette

Omid Efficient Transaction Mgmt and Processing for HBase
Omid Efficient Transaction Mgmt and Processing for HBaseOmid Efficient Transaction Mgmt and Processing for HBase
Omid Efficient Transaction Mgmt and Processing for HBaseDataWorks Summit
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJDaniel Madrigal
 
Making the leap to BI on Hadoop by Mariani, dave @ atscale
Making the leap to BI on Hadoop by Mariani, dave @ atscaleMaking the leap to BI on Hadoop by Mariani, dave @ atscale
Making the leap to BI on Hadoop by Mariani, dave @ atscaleTin Ho
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkDataWorks Summit/Hadoop Summit
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it DataWorks Summit/Hadoop Summit
 
Open Source Ingredients for Interactive Data Analysis in Spark
Open Source Ingredients for Interactive Data Analysis in Spark Open Source Ingredients for Interactive Data Analysis in Spark
Open Source Ingredients for Interactive Data Analysis in Spark DataWorks Summit/Hadoop Summit
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataDataWorks Summit/Hadoop Summit
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveDataWorks Summit/Hadoop Summit
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJIntro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJDaniel Madrigal
 

En vedette (20)

Omid Efficient Transaction Mgmt and Processing for HBase
Omid Efficient Transaction Mgmt and Processing for HBaseOmid Efficient Transaction Mgmt and Processing for HBase
Omid Efficient Transaction Mgmt and Processing for HBase
 
Real Time BI with Hadoop
Real Time BI with HadoopReal Time BI with Hadoop
Real Time BI with Hadoop
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
 
Using Hadoop for Cognitive Analytics
Using Hadoop for Cognitive AnalyticsUsing Hadoop for Cognitive Analytics
Using Hadoop for Cognitive Analytics
 
Making the leap to BI on Hadoop by Mariani, dave @ atscale
Making the leap to BI on Hadoop by Mariani, dave @ atscaleMaking the leap to BI on Hadoop by Mariani, dave @ atscale
Making the leap to BI on Hadoop by Mariani, dave @ atscale
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
The Path to Wellness through Big Data
The Path to Wellness through Big DataThe Path to Wellness through Big Data
The Path to Wellness through Big Data
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
HIPAA Compliance in the Cloud
HIPAA Compliance in the CloudHIPAA Compliance in the Cloud
HIPAA Compliance in the Cloud
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
 
Open Source Ingredients for Interactive Data Analysis in Spark
Open Source Ingredients for Interactive Data Analysis in Spark Open Source Ingredients for Interactive Data Analysis in Spark
Open Source Ingredients for Interactive Data Analysis in Spark
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Extreme Analytics @ eBay
Extreme Analytics @ eBayExtreme Analytics @ eBay
Extreme Analytics @ eBay
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJIntro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 

Similaire à Omid: A Transactional Framework for HBase

Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingGuozhang Wang
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL ServerMark Kromer
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013alanfgates
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureGabriele Modena
 
Traitement temps réel chez Streamroot - Golang Paris Juin 2016
Traitement temps réel chez Streamroot - Golang Paris Juin 2016Traitement temps réel chez Streamroot - Golang Paris Juin 2016
Traitement temps réel chez Streamroot - Golang Paris Juin 2016Simon Caplette
 
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven MicroservicesFabrizio Fortino
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesC4Media
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 

Similaire à Omid: A Transactional Framework for HBase (20)

Omid: A transactional Framework for HBase
Omid: A transactional Framework for HBaseOmid: A transactional Framework for HBase
Omid: A transactional Framework for HBase
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
Traitement temps réel chez Streamroot - Golang Paris Juin 2016
Traitement temps réel chez Streamroot - Golang Paris Juin 2016Traitement temps réel chez Streamroot - Golang Paris Juin 2016
Traitement temps réel chez Streamroot - Golang Paris Juin 2016
 
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 

Plus de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Plus de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Dernier

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Dernier (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Omid: A Transactional Framework for HBase

  • 1. Omid: A Transactional Framework for HBase Francisco Perez-Sorrosal Ohad Shacham Hadoop Summit SJ June 29th, 2016
  • 2. Outline  Background  Basic Concepts  Use cases  Architecture  Transaction Management  High Availability  Performance  Summary Hadoop Summit SJ (June 29th 2016)2
  • 3.  New Big data apps → new requirements: ● Low-latency ● Incremental data processing ● e.g. Percolator  Multiple clients updating same data concurrently ● Problem: Conflicts/Inconsistencies may arise ● Solution: Transactional Access to Data Background Hadoop Summit SJ (June 29th 2016)3
  • 4.  Transaction → Abstract UoW to manage data with certain guarantees ● ACID ● Relational databases  Big data → NoSQL datastores → Transactions in NoSQL ● Relaxed Guarantees: ○ e.g. Atomicity, Consistency Background ● Hard to Scale ○ Data partition ○ Data replication Hadoop Summit SJ (June 29th 2016)4
  • 5.  Flexible  Reliable  High Performant  Scalable …OLTP framework that allows BigData apps to execute ACID transactions on top of HBase + = Consistency in BigData Apps Omid is a… Hadoop Summit SJ (June 29th 2016)5
  • 6. Why use Omid?  Simplifies development of apps requiring consistency ● Multi-row/multi-table transactions on HBase ● Simple & well-known interface  Good performance & reliability  Lock-free  Snapshot Isolation  HBase is a blackbox ● No HBase code modification ● No changes on table schemas  Used successfully at Yahoo Hadoop Summit SJ (June 29th 2016)6
  • 7. Snapshot Isolation ▪ Transaction T2 overlaps in time with T1 & T3, but spatially: ● T1 ∩ T2 = ∅ ● T2 ∩ T3 = { R4 } Transactions T2 and T3 conflict ▪ Transaction T4 does not have conflicts TxId T1 T2 T3 T4 Time Overlap Spatial Overlap (WriteSet) R1 R2 R3 R4 R3 R4 R2 R4 R1 R3 Hadoop Summit SJ (June 29th 2016)7
  • 8. Sieve Use Cases: Sieve @ Yahoo HBase Internet Crawler Doc Proc Aggregation Omid Feeder Real-Time Index Notifications Transactional Data Flow Hadoop Summit SJ (June 29th 2016)8
  • 9. Hive Metastore Thrift Server Use Cases: HBase HBaseStore Omid Hadoop Summit SJ (June 29th 2016) ObjectStore Relational Database 9
  • 10. Transactional App Architectural Components HBase Omid Client Transaction Status Oracle (TSO) Timestamp Oracle Get Start/Commit Timestamps Start/Commit TXs Keep track & Validate TXs Commit Table Compactor Commit data R/W data Guarantee SI App Table Shadow CellsApp TableApp Table Shadow Cells Hadoop Summit SJ (June 29th 2016)10
  • 11. Client APIs ▪ Transaction Manager → Create Transactional contexts Transaction begin(); void commit(Transaction tx); void rollback(Transaction tx); ▪ Transactional Tables (TTable) → Data access Result get(Transaction tx, Get g); void put(Transaction tx, Put p); ResultScanner getScanner(Transaction tx, Scan s); Hadoop Summit SJ (June 29th 2016)11
  • 12. TX Management (Begin TX phase) Omid Client TSO TO Table/SC CommitTable Begin TX Get ST ST=1 TX(ST=1) R/W Ops for TX (ST=1) App Begin TX R/W Ops (within TX context) TX Context R/W Results for TX with ST=1 Read Ops: Get right results for TX’s SnapshotWrite Ops: Build Writeset for TX Hadoop Summit SJ (June 29th 2016)12
  • 13. TX Management (Commit TX Phase) Omid Client TSO TO Table/SC CommitTable Commit TX (Writeset) Get CT CT=2 TX(CT=2) App Commit TX Check Conflicts of TX Writeset in Conflict Map Persist commit details (ST/CT) for TX Hadoop Summit SJ (June 29th 2016)13
  • 14. TX Management (Complete TX Phase) Omid Client TSO TO Table/SC CommitTable Update SC for TX (ST=1/CT=2) App Complete commit (Cleanup entry for TX with ST=1) Result Hadoop Summit SJ (June 29th 2016)14
  • 15. Transactional App High Availability HBase Omid Client Transaction Status Oracle Timestamp Oracle Get Start/Commit Timestamps Start/Commit TXs Commit Table Compactor Commit data R/W data Guarantee SI App Table Shadow CellsApp TableApp Table Shadow Cells Single point of failure Hadoop Summit SJ (June 29th 2016)15
  • 16. Timestamp Oracle Transaction Status Oracle Transactional App High Availability HBase Omid Client Transaction Status Oracle Timestamp Oracle Get Start/Commit Timestamps Start/Commit TXs Commit Table Compactor Commit data R/W data Guarantee SI App Table Shadow CellsApp TableApp Table Shadow CellsRecovery State Primary / Backup Hadoop Summit SJ (June 29th 2016)16
  • 17. High Availability – Failing Scenario Omid Client TSO P TSO B Table/SC CommitTableApp Begin TX Begin TX Get ST ST=1 TX(ST=1) TX 1 TO Data Store Commit Table Write(k1, v1) (ST=1) TX 1 Write(k1, v1) (k1, v1, 1) Hadoop Summit SJ (June 29th 2016)17
  • 18. High Availability – Failing Scenario Omid Client TSO P TSO B Table/SC CommitTableApp TO Data Store Commit Table Write(k2, v2) (ST=1) Write(k2, v2) (k1, v1, 1) (k2, v2, 1) Commit TX 1{k1, k2} Commit TX 1 Get CT CT=2 Persist commit details for TX 1 Hadoop Summit SJ (June 29th 2016)18
  • 19. High Availability – Failing Scenario Omid Client TSO B Table/SC CommitTableApp Begin TX Begin TX Get ST ST=3 TX(ST=3) TX 3 TO Data Store Commit Table Read(k1) (ST=3) TX 3 Read(k1) (k1, v1, 1) (k1, v1, 1) (k2, v2, 1)Hadoop Summit SJ (June 29th 2016)19
  • 20. High Availability – Failing Scenario Omid Client TSO B Table/SC CommitTableApp TO Data Store Commit Table Return TX 1 CT (k1, v1, 1) ! exist ! exist Read(k2) (ST=3) (k2, v2, 1) TX 3 Read(k2) (k2, v2, 1) CT = 2 Return TX 1 CT v2 (1, 2)Hadoop Summit SJ (June 29th 2016)20
  • 21. Timestamp Oracle Transaction Status Oracle Transactional App High Availability HBase Omid Client Transaction Status Oracle Timestamp Oracle Get Start/Commit Timestamps Start/Commit TXs Commit Table Compactor R/W data Guarantee SI App Table Shadow CellsApp TableApp Table Shadow CellsRecovery State Hadoop Summit SJ (June 29th 2016)21
  • 22. High Availability – Solution Omid Client TSO P TSO B Table/SC CommitTableApp Begin TX Begin TX Get ST ST=1 TX(ST=1,E=1) TX 1, 1 TO Data Store Commit Table Write(k1, v1) (ST=1) TX 1 Write(k1, v1) (k1, v1, 1) Hadoop Summit SJ (June 29th 2016)22
  • 23. High Availability – Solution Omid Client TSO P TSO B Table/SC CommitTableApp TO Data Store Commit Table Write(k2, v2) (ST=1) Write(k2, v2) (k1, v1, 1) (k2, v2, 1) Commit TX 1{k1, k2} Commit TX 1 Get CT CT=2 Persist commit details for TX 1 Hadoop Summit SJ (June 29th 2016)23
  • 24. High Availability – Solution Omid Client TSO B Table/SC CommitTableApp Begin TX Begin TX Get ST ST=3 TX(ST=3,E=3) TX 3,3 TO Data Store Commit Table Read(k1) (ST=3) TX 3 Read(k1) (k1, v1, 1) (k1, v1, 1) (k2, v2, 1)Hadoop Summit SJ (June 29th 2016)24
  • 25. High Availability – Solution Omid Client TSO B Table/SC CommitTableApp TO Data Store Commit Table Return TX1 CT (k1, v1, 1) ! exist (k2, v2, 1) Invalid Try invalidate (1, -, invalid) ! exist Read(k2) (ST=3) (k2, v2, 1) TX 3 Read(k2) Hadoop Summit SJ (June 29th 2016)25
  • 26. High Availability – Solution Omid Client TSO B Table/SC CommitTableApp TO Data Store Commit Table Return TX 1 CT (k1, v1, 1) ! exist ! exist (k2, v2, 1) (1, 2, invalid)Hadoop Summit SJ (June 29th 2016)26
  • 27. High Availability  No runtime overhead in mainstream execution • Minor overhead after failover  TSO uses regular writes  Leases for leader election • Lease status check before/after writing to Commit Table Hadoop Summit SJ (June 29th 2016)27
  • 28. Perf. Improvements: Read-Only Txs Omid Client TSO/TO Table/SC Begin TX TX(ST=1) Read Ops for TX (ST=1) App Begin TX Read Ops (in TX context) TX Context Read Results in Snapshot Commit TX Writeset is ∅, so no need to contact TSO!!!Success Hadoop Summit SJ (June 29th 2016)28
  • 29. TSO HBase Perf. Improvements: Commit Table Writes Omid Client HBase TSO Commit Table Commit Data Omid Client Commit Data Hadoop Summit SJ (June 29th 2016)29
  • 30. HBase TSO Perf. Improvements: Commit Table Writes Omid Client HBase TSO Commit Table Commit Data Omid Client Commit Data Hadoop Summit SJ (June 29th 2016)30
  • 31. 0 50 100 150 200 250 300 350 400 1 2 4 6 Tps*103 Commit Table: # Region servers Omid Throughput with Improvements Hadoop Summit SJ (June 29th 2016)31
  • 32. Summary  Transactions in NoSQL • Use cases in incremental big data processing • Snapshot Isolation: Scalable consistency model  Omid • Web-scale TPS for HBase • Reliable and performant • Battle-tested • http://omid.incubator.apache.org/ Hadoop Summit SJ (June 29th 2016)32
  • 33. Questions? Hadoop Summit SJ (June 29th 2016)33