SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Comprehensive View on
Intervals in Apache Spark 3.2
Maxim Gekk
Software Engineer @ Databricks
About me
Databricks Software Engineer
Apache Spark committer
@MaxGekk
Agenda
▪ Overview of new interval
types in Spark 3.2
▪ Limitations of
CalendarIntervalType
▪ Year-Month Interval
▪ Day-Time Interval
SPARK-27790
Support ANSI SQL INTERVAL types
• Spark SQL 3.2 releases two
new Catalyst’s types: year-
month interval and day-
time interval types
• CalendarIntervalType is not
recommended to use, and
will be deprecated.
CalendarIntervalType
is combination of
(months, days, microseconds)
Problems of
CalendarIntervalType:
not comparable
Problems of
CalendarIntervalType:
unordered
Problems of
CalendarIntervalType:
I. Cannot be persistent to any
external storages
||. Inefficient memory usage
~12 bytes per value
|||. Incompatible to the SQL
standard
SQL standard
interval types
Year-Month Interval = (YEAR, MONTH)
Day-Time Interval = (DAY, HOUR, MINUTE, SECOND)
New Catalyst types in Apache Spark 3.2
▪ Precision: months
▪ Comparable and orderable
▪ Value size: 4 bytes
▪ Minimal value:
INTERVAL ‘-178956970-8’ YEAR TO MONTH
▪ Maximum value:
INTERVAL ‘178956970-7’ YEAR TO MONTH
▪ Precision: microseconds
▪ Comparable and orderable
▪ Value size: 8 bytes
▪ Minimal value:
INTERVAL ‘106751991 04:00:54.775807’
DAY TO SECOND
▪ Maximum value:
INTERVAL ‘-106751991 04:00:54.775808’
DAY TO SECOND
• YearMonthIntervalType • DayTimeIntervalType
Creation of interval columns
▪ Interval literals:
INTERVAL ‘1-1’ YEAR TO
MONTH
INTERVAL ‘1 02:03:04’ DAY
TO SECOND
▪ Casting string to interval
types:
$”col”
.cast(YearMonthIntervalTyp
e)
• Parallelize collections of
java.time.Period:
Seq(Period.ofDays(10)).toDS
• From collection of
java.time.Duration:
Seq(Duration.ofDays(10)).toD
S
• From external types
• From interval strings
• Function-constructor of
interval types:
make_interval(1, 2)
make_interval(1, 2, 3, 4,
5.123)
• From integral fields
Operations involving datetimes and intervals
Arithmetic operations involving values of type datetime or interval obey the natural rules associated
with dates and times and yield valid datetime or interval results according to the Gregorian calendar.
Interval operations in Apache Spark 3.2
▪ YearMonthIntervalType [* | /]
NumericType
▪ YearMonthIntervalType [+ | -]
YearMonthIntervalType
▪ DayTimeIntervalType [+ | - | * | /]
NumericType
▪ TimestampType - TimestampType
▪ DateType - DateType
DayTimeIntervalType =
YearMonthIntervalType =
▪ DateType [+ | -] YearMonthIntervalType =
DateType
▪ TimestampType [+ | -]
YearMonthIntervalType =
TimestampType
▪ DateType [+ | -] DayTimeIntervalType =
TimestampType
▪ TimestampType [+ | -]
DayTimeIntervalType = TimestampType
date + day-time interval
[SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date
spark.sql.legacy.interval.enabled
• When set to true, Spark SQL uses the mixed legacy interval
type CalendarIntervalType instead of the ANSI compliant
interval types YearMonthIntervalType and
DayTimeIntervalType.
• It impacts on:
• Dates and timestamp subtractions
• Parsing of ANSI interval literals:
INTERVAL ‘1 02:03:04’ DAY TO SECOND
Daylight saving time
External Java types
▪ This class models a quantity or
amount of time in terms of years,
months and days. Spark takes years
and months fields only.
▪ This class models a quantity or
amount of time in terms of seconds
and nanoseconds. Spark casts the
nanoseconds to microseconds.
java.time.Duration
java.time.Period
ANSI intervals in UDF/UDAF
Day-Time Interval
Year-Month Interval
Specification of interval types in schemas
• Day-Time Interval Type
• Year-Month Interval type
▪ CREATE TABLE tbl (
id INT,
delay INTERVAL YEAR TO MONTH
)
▪ CREATE TABLE tbl (
len INT,
tout INTERVAL DAY TO SECOND
)
SPARK-27790: Support ANSI SQL INTERVAL types:
Milestone 1 – Spark Interval equivalency ( The new interval types meet or exceed all function of the existing SQL
Interval)
Milestone 2 – Persistence:
Ability to create tables of type interval
Ability to write to common file formats such as Parquet and JSON.
INSERT, SELECT, UPDATE, MERGE
Discovery
Milestone 3 – Client support
JDBC support
Hive Thrift server
Milestone 4 – PySpark and Spark R integration
Python UDF can take and return intervals
DataFrame support
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

Contenu connexe

Similaire à Comprehensive View on Intervals in Apache Spark 3.2

Comprehensive View on Date-time APIs of Apache Spark 3.0
Comprehensive View on Date-time APIs of Apache Spark 3.0Comprehensive View on Date-time APIs of Apache Spark 3.0
Comprehensive View on Date-time APIs of Apache Spark 3.0Databricks
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteSpark Summit
 
SGM EA #LondonDataTribe meet up presentation 05032019
SGM EA #LondonDataTribe meet up presentation 05032019SGM EA #LondonDataTribe meet up presentation 05032019
SGM EA #LondonDataTribe meet up presentation 05032019☁ Will Turner
 
Walking down the memory lane with temporal tables
Walking down the memory lane with temporal tablesWalking down the memory lane with temporal tables
Walking down the memory lane with temporal tablesArgelo Royce Bautista
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaDatabricks
 
Data Structure & Algorithm - Self Referential
Data Structure & Algorithm - Self ReferentialData Structure & Algorithm - Self Referential
Data Structure & Algorithm - Self Referentialbabuk110
 
Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Stéphane Fréchette
 

Similaire à Comprehensive View on Intervals in Apache Spark 3.2 (11)

Comprehensive View on Date-time APIs of Apache Spark 3.0
Comprehensive View on Date-time APIs of Apache Spark 3.0Comprehensive View on Date-time APIs of Apache Spark 3.0
Comprehensive View on Date-time APIs of Apache Spark 3.0
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
 
Catalyst optimizer
Catalyst optimizerCatalyst optimizer
Catalyst optimizer
 
SGM EA #LondonDataTribe meet up presentation 05032019
SGM EA #LondonDataTribe meet up presentation 05032019SGM EA #LondonDataTribe meet up presentation 05032019
SGM EA #LondonDataTribe meet up presentation 05032019
 
Java 8
Java 8Java 8
Java 8
 
Walking down the memory lane with temporal tables
Walking down the memory lane with temporal tablesWalking down the memory lane with temporal tables
Walking down the memory lane with temporal tables
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu Ma
 
Java 8 Date-Time API
Java 8 Date-Time APIJava 8 Date-Time API
Java 8 Date-Time API
 
SLALOM Project Technical Webinar 20151111
SLALOM Project Technical Webinar 20151111 SLALOM Project Technical Webinar 20151111
SLALOM Project Technical Webinar 20151111
 
Data Structure & Algorithm - Self Referential
Data Structure & Algorithm - Self ReferentialData Structure & Algorithm - Self Referential
Data Structure & Algorithm - Self Referential
 
Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Dernier

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Dernier (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Comprehensive View on Intervals in Apache Spark 3.2

  • 1. Comprehensive View on Intervals in Apache Spark 3.2 Maxim Gekk Software Engineer @ Databricks
  • 2. About me Databricks Software Engineer Apache Spark committer @MaxGekk
  • 3. Agenda ▪ Overview of new interval types in Spark 3.2 ▪ Limitations of CalendarIntervalType ▪ Year-Month Interval ▪ Day-Time Interval
  • 4. SPARK-27790 Support ANSI SQL INTERVAL types • Spark SQL 3.2 releases two new Catalyst’s types: year- month interval and day- time interval types • CalendarIntervalType is not recommended to use, and will be deprecated.
  • 8. Problems of CalendarIntervalType: I. Cannot be persistent to any external storages ||. Inefficient memory usage ~12 bytes per value |||. Incompatible to the SQL standard
  • 10. Year-Month Interval = (YEAR, MONTH) Day-Time Interval = (DAY, HOUR, MINUTE, SECOND)
  • 11. New Catalyst types in Apache Spark 3.2 ▪ Precision: months ▪ Comparable and orderable ▪ Value size: 4 bytes ▪ Minimal value: INTERVAL ‘-178956970-8’ YEAR TO MONTH ▪ Maximum value: INTERVAL ‘178956970-7’ YEAR TO MONTH ▪ Precision: microseconds ▪ Comparable and orderable ▪ Value size: 8 bytes ▪ Minimal value: INTERVAL ‘106751991 04:00:54.775807’ DAY TO SECOND ▪ Maximum value: INTERVAL ‘-106751991 04:00:54.775808’ DAY TO SECOND • YearMonthIntervalType • DayTimeIntervalType
  • 12. Creation of interval columns ▪ Interval literals: INTERVAL ‘1-1’ YEAR TO MONTH INTERVAL ‘1 02:03:04’ DAY TO SECOND ▪ Casting string to interval types: $”col” .cast(YearMonthIntervalTyp e) • Parallelize collections of java.time.Period: Seq(Period.ofDays(10)).toDS • From collection of java.time.Duration: Seq(Duration.ofDays(10)).toD S • From external types • From interval strings • Function-constructor of interval types: make_interval(1, 2) make_interval(1, 2, 3, 4, 5.123) • From integral fields
  • 13. Operations involving datetimes and intervals Arithmetic operations involving values of type datetime or interval obey the natural rules associated with dates and times and yield valid datetime or interval results according to the Gregorian calendar.
  • 14. Interval operations in Apache Spark 3.2 ▪ YearMonthIntervalType [* | /] NumericType ▪ YearMonthIntervalType [+ | -] YearMonthIntervalType ▪ DayTimeIntervalType [+ | - | * | /] NumericType ▪ TimestampType - TimestampType ▪ DateType - DateType DayTimeIntervalType = YearMonthIntervalType = ▪ DateType [+ | -] YearMonthIntervalType = DateType ▪ TimestampType [+ | -] YearMonthIntervalType = TimestampType ▪ DateType [+ | -] DayTimeIntervalType = TimestampType ▪ TimestampType [+ | -] DayTimeIntervalType = TimestampType
  • 15. date + day-time interval [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date
  • 16. spark.sql.legacy.interval.enabled • When set to true, Spark SQL uses the mixed legacy interval type CalendarIntervalType instead of the ANSI compliant interval types YearMonthIntervalType and DayTimeIntervalType. • It impacts on: • Dates and timestamp subtractions • Parsing of ANSI interval literals: INTERVAL ‘1 02:03:04’ DAY TO SECOND
  • 18. External Java types ▪ This class models a quantity or amount of time in terms of years, months and days. Spark takes years and months fields only. ▪ This class models a quantity or amount of time in terms of seconds and nanoseconds. Spark casts the nanoseconds to microseconds. java.time.Duration java.time.Period
  • 19. ANSI intervals in UDF/UDAF Day-Time Interval Year-Month Interval
  • 20. Specification of interval types in schemas • Day-Time Interval Type • Year-Month Interval type ▪ CREATE TABLE tbl ( id INT, delay INTERVAL YEAR TO MONTH ) ▪ CREATE TABLE tbl ( len INT, tout INTERVAL DAY TO SECOND )
  • 21. SPARK-27790: Support ANSI SQL INTERVAL types: Milestone 1 – Spark Interval equivalency ( The new interval types meet or exceed all function of the existing SQL Interval) Milestone 2 – Persistence: Ability to create tables of type interval Ability to write to common file formats such as Parquet and JSON. INSERT, SELECT, UPDATE, MERGE Discovery Milestone 3 – Client support JDBC support Hive Thrift server Milestone 4 – PySpark and Spark R integration Python UDF can take and return intervals DataFrame support
  • 22. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.