Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

SFScon22 - Anton Dignoes - Managing Temporal Data in PostgreSQL.pdf

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 22 Publicité

SFScon22 - Anton Dignoes - Managing Temporal Data in PostgreSQL.pdf

Télécharger pour lire hors ligne

Every application has to store and manage data that in one form or another has a temporal extent. For some years, database systems started integrating features that help with the management of such kinds of data. In this talk, we dive into the support that the Open Source database system PostgreSQL together with its ecosystem provides to facilitate the querying and processing of temporal data. We will also take a brief look at the projects we are working on at unibz in this context.

Every application has to store and manage data that in one form or another has a temporal extent. For some years, database systems started integrating features that help with the management of such kinds of data. In this talk, we dive into the support that the Open Source database system PostgreSQL together with its ecosystem provides to facilitate the querying and processing of temporal data. We will also take a brief look at the projects we are working on at unibz in this context.

Publicité
Publicité

Plus De Contenu Connexe

Plus par South Tyrol Free Software Conference (20)

Publicité

SFScon22 - Anton Dignoes - Managing Temporal Data in PostgreSQL.pdf

  1. 1. Managing Temporal Data in PostgreSQL Anton Dignös Free University of Bozen-Bolzano SFScon 2022 November 11, 2022 AUTONOME PROVINZ BOZEN SÜDTIROL PROVINCIA AUTONOMA DI BOLZANO ALTO ADIGE Research Südtirol/Alto Adige 2019 Project ISTeP CUP: I52F20000250003 EFRE 2014-2020 Project EFRE1164 PREMISE CUP: I59C20000340009
  2. 2. Agenda What is temporal data? Period temporal data in Postgres Time series data in Postgres What are we working on? This talk will only provide a glimpse, if you are interested in more details, I am happy to talk to you during the conference! SFScon 2022 2/22 A. Dignös
  3. 3. Temporal data Temporal data can be found in many application ▶ HR contracts ▶ Insurance policies ▶ Tourism data ▶ Medical domain ▶ Stock market data ▶ Industrial data SFScon 2022 3/22 A. Dignös
  4. 4. What is temporal data? Data with a “timestamp” + The “timestamp” indicates the validity of the data Examples: ▶ A contract with a validity period ▶ A sensor reading with the measurement time ▶ An error event with the happening time SFScon 2022 4/22 A. Dignös
  5. 5. Basic utilities for date/time in Postgres ▶ Postgres provides different date/time datatypes1 ▶ Many functions ▶ Operators (+, -) ▶ Calendar functions (EXTRACT, date trunc) ▶ Whoever worked with dates/timezones knows to appreciate these 1 https://www.postgresql.org/docs/current/datatype-datetime.html SFScon 2022 5/22 A. Dignös
  6. 6. Topic of today Today it is about temporal data, not just storing dates or time ▶ Period temporal data ▶ Contracts ▶ Manufacturing periods ▶ Error states ▶ Time series data ▶ Sensor readings ▶ Stock market data ▶ Error events Let’s have a peek on what Postgres and it’s ecosystem has to offer! SFScon 2022 6/22 A. Dignös
  7. 7. Highlights for period temporal data in Postgres ▶ Postgres provides range types2 for managing period data ▶ What are range types? ▶ Datatypes for periods ’[start, end)’ ▶ Can have different forms – ’[ , )’,’[ , ]’, ’( , ]’, ’( , )’ ▶ Available for different types, e.g., INT, NUMERIC, DATE ▶ Many predicates and functions ▶ Indices available (GiST, SP-GiST, btree gist) ▶ Very easy to use ▶ Avoid many programming mistakes 2 https://www.postgresql.org/docs/current/rangetypes.html SFScon 2022 7/22 A. Dignös
  8. 8. An example Product prices that change over time CREATE TABLE prices( product INT , period DATERANGE , value FLOAT ); INSERT INTO prices VALUES (1, ’[2021 -08 -01 ,␣2022 -08 -01) ’, 25), (1, ’[2022 -08 -01 ,) ’, 30), (2, ’[2021 -08 -01 ,␣2022 -04 -01) ’, 10), (2, ’[2022 -04 -01 ,) ’, 20); product | period | value ---------+-------------------------+------- 1 | [2021-08-01,2022-08-01) | 25 1 | [2022-08-01,) | 30 2 | [2021-08-01,2022-04-01) | 10 2 | [2022-04-01,) | 20 SFScon 2022 8/22 A. Dignös
  9. 9. Common queries ▶ What are the prices of products today? WHERE period @> CURRENT_DATE ▶ What were the prices of products on the 2021-10-30? WHERE period @> ’2021 -10 -30 ’ ▶ What were the previous prices of products? WHERE period << daterange(CURRENT_DATE , NULL , ’[)’) ▶ What were the prices of products between 2021-10-30 and 2022-10-30? WHERE period && DATERANGE(’2021 -10 -30 ’,’2022 -10 -30 ’, ’[]’) SFScon 2022 9/22 A. Dignös
  10. 10. Uniqueness Constraints Ensure a product does not have two prices at the same time CREATE TABLE prices( product INT , period DATERANGE , value FLOAT , EXCLUDE USING GIST (product WITH =, period WITH &&)); product | period | value ---------+-------------------------+------- 1 | [2021-08-01,2022-08-01) | 25 1 | [2022-08-01,) | 30 2 | [2021-08-01,2022-04-01) | 10 2 | [2022-04-01,) | 20 INSERT INTO product_prices VALUES (1, ’[2022 -08 -04 ,) ’, 100); ERROR: conflicting key value violates exclusion constraint ... DETAIL: Key (product, period)=(1, [2022-08-04,)) conflicts ... SFScon 2022 10/22 A. Dignös
  11. 11. Take home messages ▶ Range types is Postgres’ native period datatype ▶ Convenient representation of periods ▶ Many base datatypes are supported ▶ Support different period definitions if needed ▶ Many convenient predicates and functions ▶ Less error prone than custom builds ▶ Can be speed up using GiST indices ▶ Uniqueness constraints available ▶ Avoid inconsistencies at the source SFScon 2022 11/22 A. Dignös
  12. 12. Highlights for time series data in Postgres ▶ TimescaleDB can be used to manage time series in Postgres ▶ What is TimescaleDB? ▶ TimescaleDB is a Postgres extension (based on UDFs) ▶ Runs on server side ▶ License (two versions of TimescaleDB with different support)3 ▶ TimescaleDB Apache 2 Edition (Apache 2.0 license) ▶ TimescaleDB Community Edition (Timescale License – TSL) ▶ See https://docs.timescale.com/timescaledb/latest/ timescaledb-edition-comparison ▶ Available for most platforms as a binary or compile form source 3 Thanks to Chris Mair from 1006.org for pointing this out during a previous talk! SFScon 2022 12/22 A. Dignös
  13. 13. What does TimescaleDB do? Eases the timeseries data management ▶ Convenient timeseries specific functions (hyperfunctions) ▶ Gap-filling and Interpolation ▶ Weighted averages ▶ . . . ▶ Partitioning (hypertables) ▶ Access less data (faster runtime) ▶ Compression ▶ Make data smaller (also faster runtime) SFScon 2022 13/22 A. Dignös
  14. 14. Hyperfunctions/1 SFScon 2022 14/22 A. Dignös
  15. 15. Hyperfunctions/2 Produce a value every five minutes and interpolate missing ones SELECT time_bucket_gapfill (’5␣minutes ’, time) AS five_min , avg(value) AS value , -- average from data interpolate(avg(value )) -- interpolate average if missing FROM sensor_signal WHERE sensor_id = 3 AND time BETWEEN now () - INTERVAL ’20␣min ’ AND now () GROUP BY five_min ORDER BY five_min; five_min | value | interpolate ---------------------+-------+------------- 2022-11-11 15:40:00 | 16.2 | 16.2 2022-11-11 15:45:00 | | 16 2022-11-11 15:50:00 | 15.8 | 15.8 2022-11-11 15:55:00 | | 11.9 2022-11-11 16:00:00 | 8 | 8 SFScon 2022 15/22 A. Dignös
  16. 16. Hypertables/1 3 Picture taken from timescale.com SFScon 2022 16/22 A. Dignös
  17. 17. Hypertables/2 Transform our table into a hypertable SELECT create_hypertable ( ’sensor_signal ’, ’time ’, chunk_time_interval => INTERVAL ’2␣days ’, partitioning_column => ’sensor_id ’, number_partitions => 2, if_not_exists => true , migrate_data => true ); ▶ Partition by range on time every two days ▶ Partition by hash on id using 2 partitions SFScon 2022 17/22 A. Dignös
  18. 18. Hypertables/3 ▶ Be careful with the partitioning ▶ Relevant partitions are merged using UNION ALL ▶ New data keeps on adding partitions ▶ Example: 100 sensors and 3 years of data chunk time interval => ’INTERVAL 7 days’ number partitions => 50 Result: potentially 3 · 52 · 50 = 7800 tables!! SFScon 2022 18/22 A. Dignös
  19. 19. Compression ▶ Compression aims at reducing the size of the data ▶ Done at a per chunk (partition) level ▶ Usually also improves query time ▶ Transparent to the user ▶ Done via a TimescaleDB function SFScon 2022 19/22 A. Dignös
  20. 20. Take home messages ▶ Timescale handles timeseries data transparently ▶ For you it is just a relation ▶ SQL will still work as before ▶ Use hyperfunctions ▶ Handy and much faster than custom builds ▶ Keep on improving ▶ Use hypertables ▶ Limit the search space ▶ But be careful with how to partition ▶ Use compression ▶ Improves performance substantially ▶ Should be used on (old) read-only data SFScon 2022 20/22 A. Dignös
  21. 21. What are we working on? ▶ Period temporal data (project: ISTeP4) ▶ Temporal range and overlap joins ▶ Temporal anomalies in healthcare information systems ▶ Temporal key/foreign constraints ▶ Temporal histograms for cardinality estimation ▶ Time series data (project: PREMISE5) ▶ Predictive maintenance for industrial equipment ▶ Data ingestion infrastructure ▶ Data storage infrastructure ▶ Feature extraction 4 https://dbs.inf.unibz.it/projects/istep/ 5 https://dbs.inf.unibz.it/projects/premise/ SFScon 2022 21/22 A. Dignös
  22. 22. Thank you! anton.dignoes@unibz.it SFScon 2022 22/22 A. Dignös

×