SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
ACID ORC, Iceberg
and Delta Lake
Michal Gancarski
michal.gancarski@zalando.de
17-10-2019
an overview of table formats
for large scale storage and analytics
wssbck
2
TABLE OF
CONTENTS
All Is Not Well In The Land Of Big Data
There Is Hope, However
This Is How We Do It
Moving Forward
3
All Is Not Well
In The Land Of Big Data
4
ACID Properties
5
Single Node Database
6
Distributed Database
7
Distributed Data Infrastructure
8
Lost ACID
9
There Is Hope, However
10
A Table Format?
11
ACID ORC
12
ACID ORC
./d_manufacturers/country=de/base_00000002/
-- bucket_00000
-- bucket_00001
./d_manufacturers/country=de/delta_0000003_0000003_0000/
-- bucket_00000
-- bucket_00001
./d_manufacturers/country=de/delta_0000004_0000004_0000/
-- bucket_00001
./d_manufacturers/country=de/delete_delta_0000004_0000004_0000/
-- bucket_00001
CREATE TABLE d_manufacturers (id int, name string)
PARTITIONED BY (country string)
STORED AS ORC
TBLPROPERTIES ('transactional'='true');
13
ACID ORC
❖ Native compatibility with Hive
❖ Fast updates / upserts (no file rewrite)
❖ Hive 2.x ACID ORC tables can be converted to
Hive 3.x ACID ORC tables
❖ Commercial Support (Cloudera)
❖ Limited support for Spark (being worked on by
Qubole)
❖ Slow listing and metadata discovery
❖ Potentially slower read due to ad-hoc compaction
❖ ORC only
❖ Mandatory S3Guard or EMR with consistent view
enabled
+
-
14
Apache Iceberg
15
Apache Iceberg
val df = spark.read
.format("iceberg")
.load("s3://datalake/d_manufacturers")
16
Apache Iceberg
❖ Parquet, Avro, ORC supported as file formats
❖ Robust schema and partitioning changes
❖ Fast query planning
❖ Presto connector
❖ Time travel with snapshot id listing
❖ No dependency on Spark
public List<Snapshot> snapshots() {
return snapshots;
}
❖ Spark support
❖ Sparse documentation
❖ No commercial support
❖ Not as mature as other formats
+
-
17
Delta Lake
18
Delta Lake
./d_manufacturers/_delta_log/
-- 000000.json
-- ...
-- 000010.checkpoint.parquet
-- _latest_checkpoint
./d_manufacturers/country=de/
-- file_1.parquet
-- file_2.parquet
./d_manufacturers/country=fr/
-- file_3.parquet
val df = spark.read
.format("delta")
.load("s3://datalake/d_manufacturers")
CONVERT TO DELTA parquet.`s3://datalake/d_manufacturers`
19
Delta Lake
❖ Great integration with Spark, including Structured
Streaming
❖ Merge syntax in Spark SQL
❖ Time travel
❖ Comprehensive, well written documentation
❖ Fast development backed by a commercial entity
❖ VACUUM + OPTIMIZE
❖ Incoming Presto reader (Starburst)
❖ Parquet only
❖ Multicluster writes outside of Databricks only on HDFS
+
-
20
This Is How We Do It
21
Delta Lake @Zalando
22
Moving Forward
23
The Future is Bright
24
Further Reading
ACID ORC
https://orc.apache.org/docs/acid.html
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
http://shzhangji.com/blog/2019/06/10/understanding-hive-acid-transactional-table/
https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.0/using-hiveql/content/hive_3_internals.html
Iceberg
https://iceberg.apache.org/
https://iceberg.apache.org/spec/
https://github.com/apache/incubator-iceberg
https://www.youtube.com/watch?v=z7p_m17BXs8
https://www.youtube.com/watch?v=nWwQMlrjhy0
Delta Lake
https://delta.io/
https://github.com/delta-io
https://github.com/delta-io/delta/blob/master/PROTOCOL.md
https://databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html
https://databricks.com/blog/2019/09/24/diving-into-delta-lake-schema-enforcement-evolution.html
25
Further Reading
Engine Support
https://github.com/prestosql/presto/issues/576
https://github.com/prestosql/presto/issues/1324
https://github.com/prestosql/presto/pull/1067
https://docs.databricks.com/delta/presto-compatibility.html
https://www.starburstdata.com/technical-blog/starburst-presto-databricks-delta-lake-support/
https://www.qubole.com/blog/qubole-open-sources-multi-engine-support-for-updates-and-deletes-in-data-lakes/
https://github.com/qubole/spark-acid
S3 Consistency
https://issues.apache.org/jira/browse/HADOOP-13345
https://hadoop.apache.org/docs/r3.0.3/hadoop-aws/tools/hadoop-aws/s3guard.html
Other
https://www.postgresql.org/docs/current/storage.html
https://www.postgresql.org/docs/current/routine-vacuuming.html
https://dev.mysql.com/doc/refman/8.0/en/optimize-table.html
https://medium.com/@brunocrt/the-distributed-architecture-behind-cassandra-database-fba8b5cc4785
https://github.com/delta-io/delta/issues/41
26
ACID ORC, Iceberg
and Delta Lake
Michal Gancarski
michal.gancarski@zalando.de
wssbck

Contenu connexe

Tendances

Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 

Tendances (20)

Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 

Similaire à ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scale Storage and Analytics

Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...Nelson Calero
 
Oracle sharding : Installation & Configuration
Oracle sharding : Installation & ConfigurationOracle sharding : Installation & Configuration
Oracle sharding : Installation & Configurationsuresh gandhi
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraAndrey Kudryavtsev
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summitSujee Maniyam
 
2017 10-oow-fma-application-containers-v01-final
2017 10-oow-fma-application-containers-v01-final2017 10-oow-fma-application-containers-v01-final
2017 10-oow-fma-application-containers-v01-finalMarkus Flechtner
 
Experiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 ServerExperiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 ServerJomaSoft
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsRavindra kumar
 
Oracle Application Containers
Oracle Application ContainersOracle Application Containers
Oracle Application ContainersMarkus Flechtner
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Hubert Fan Chiang
 
Cowboy dating with big data TechDays at Lohika-2020
Cowboy dating with big data TechDays at Lohika-2020Cowboy dating with big data TechDays at Lohika-2020
Cowboy dating with big data TechDays at Lohika-2020b0ris_1
 
Oracle RAC One Node 12c Overview
Oracle RAC One Node 12c OverviewOracle RAC One Node 12c Overview
Oracle RAC One Node 12c OverviewMarkus Michalewicz
 
An Introduction to Spark
An Introduction to SparkAn Introduction to Spark
An Introduction to Sparkjlacefie
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetupjlacefie
 
오라클 DR 및 복제 솔루션(Dbvisit 소개)
오라클 DR 및 복제 솔루션(Dbvisit 소개)오라클 DR 및 복제 솔루션(Dbvisit 소개)
오라클 DR 및 복제 솔루션(Dbvisit 소개)Linux Foundation Korea
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームMasayuki Matsushita
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014Philippe Fierens
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 Lessons from the Field, Episode II: Applying Best Practices to Your Apache S... Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...Databricks
 
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIESORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIESLudovico Caldara
 

Similaire à ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scale Storage and Analytics (20)

Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
Automate Oracle database patches and upgrades using Fleet Provisioning and Pa...
 
Load demo-oct2016
Load demo-oct2016Load demo-oct2016
Load demo-oct2016
 
Oracle sharding : Installation & Configuration
Oracle sharding : Installation & ConfigurationOracle sharding : Installation & Configuration
Oracle sharding : Installation & Configuration
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summit
 
2017 10-oow-fma-application-containers-v01-final
2017 10-oow-fma-application-containers-v01-final2017 10-oow-fma-application-containers-v01-final
2017 10-oow-fma-application-containers-v01-final
 
Experiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 ServerExperiences with Oracle SPARC S7-2 Server
Experiences with Oracle SPARC S7-2 Server
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
 
Oracle Application Containers
Oracle Application ContainersOracle Application Containers
Oracle Application Containers
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
MariaDB pres at LeMUG
MariaDB pres at LeMUGMariaDB pres at LeMUG
MariaDB pres at LeMUG
 
Cowboy dating with big data TechDays at Lohika-2020
Cowboy dating with big data TechDays at Lohika-2020Cowboy dating with big data TechDays at Lohika-2020
Cowboy dating with big data TechDays at Lohika-2020
 
Oracle RAC One Node 12c Overview
Oracle RAC One Node 12c OverviewOracle RAC One Node 12c Overview
Oracle RAC One Node 12c Overview
 
An Introduction to Spark
An Introduction to SparkAn Introduction to Spark
An Introduction to Spark
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetup
 
오라클 DR 및 복제 솔루션(Dbvisit 소개)
오라클 DR 및 복제 솔루션(Dbvisit 소개)오라클 DR 및 복제 솔루션(Dbvisit 소개)
오라클 DR 및 복제 솔루션(Dbvisit 소개)
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 Lessons from the Field, Episode II: Applying Best Practices to Your Apache S... Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
 
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIESORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Dernier

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Dernier (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scale Storage and Analytics