4. Of every type
• Regular: Machines and sensors
• Irregular: Web and machine events
• Forward looking: Logistics and forecasting
• Derived data: Inferences from AI/ML models
7. Existing databases don’t work for time series
Relational Databases NoSQL Databases
Every other time-series database today is NoSQL
Hard to scale
Underperform on complex queries,
are hard to use, and lead to data silos
12. Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage)
Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
Hard to scale
13. Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage)
Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
Hard to scale
14. B-tree Insert Pain
1 2010
1 10 13 24 2925
5Insert batch: 178
Memory Capacity: 2 NODES
IN MEMORY
WRITE TO DISK
15. B-tree Insert Pain
1 2010
1 10 13 24 2925
5Insert batch: 178
Memory Capacity: 2 NODES
IN MEMORY
WRITE TO DISK
16. 1 2010
1 10 13 24 2925
Insert batch: 8
5
17
B-tree Insert Pain
Memory Capacity: 2 NODES
IN MEMORY
WRITE TO DISK
17. 10 13
B-tree Insert Pain
1 2010
1 24 2925
Insert batch: 8
5 17
Memory Capacity: 2 NODES
IN MEMORY
WRITE TO DISK
18. Challenge in scaling up
• Indexes write to random parts of B-tree
• As table grows large
– Indexes no longer fit in memory
– Random writes cause swapping
Device: A
Time: 01:01:01
Device: Z
Time: 01:01:01
Device, Time DESC
20. • Ingest millions of datapoint
per second
• Scale to 100s billions of rows
• Elastically scale up and out
• Faster than Influx, Cassandra,
Mongo, vanilla Postgres
Scale &
Performance
• Inherits 20+ years of
PostgreSQL reliability
• Streaming replication,
HA, backup/recovery
• Data lifecycle: continuous
rollups, retention, archiving
• Enterprise-grade security
Proven &
Enterprise Ready
• Zero learning curve
• Zero friction: Existing tools
and connectors work
• Enrich understanding: JOIN
against relational data
• Freedom for data model, no
cardinality issues
SQL for
time series
TimescaleDB
Scalable time-series database, full SQL
Packaged as a PostgreSQL extension
21. >20x
TimescaleDB vs. PostgreSQL
(batch inserts)
TimescaleDB 0.5, Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (LRS storage)
Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
1.11M
METRICS / S
22. TimescaleDB vs.
PostgreSQL
SPEEDUP
Table scans, simple
column rollups
~0-20%
GROUPBYs 20-200%
Time-ordered
GROUPBYs
400-10000x
DELETEs 2000x
TimescaleDB 0.5, Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (LRS storage)
Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
24. Key-value store with
indexed key lookup at
high-write rates
NoSQL champion: Log-Structured Merge Trees
• Compressed data storage
• Common approach for time series:
use key <name, tags, field, time>
+
25. NoSQL + LSMTs Come at a Cost
• Significant memory overhead
• Lack of secondary indexes / tag lock-in
• Less powerful queries
• Weaker consistency (no ACID)
• No JOINS
• Loss of SQL ecosystem
+
39. But treat it like a single table
Chunks
• Indexes
• Triggers
• Constraints
• Foreign keys
• UPSERTs
• Table mgmt
Hypertable
40. TimescaleDB: Easy to Get Started
CREATE TABLE conditions (
time timestamptz,
temp float,
humidity float,
device text
);
SELECT create_hypertable('conditions', 'time', ‘device', 4,
chunk_time_interval => interval '1 week’);
INSERT INTO conditions
VALUES ('2017-10-03 10:23:54+01', 73.4, 40.7, 'sensor3');
SELECT * FROM conditions;
time | temp | humidity | device
------------------------+------+----------+---------
2017-10-03 11:23:54+02 | 73.4 | 40.7 | sensor3
41. Create partitions
automatically at runtime.
Avoid a lot of manual
work.
CREATE TABLE conditions (
time timestamptz,
temp float,
humidity float,
device text
);
CREATE TABLE conditions_p1 PARTITION OF conditions
FOR VALUES FROM (MINVALUE) TO ('g')
PARTITION BY RANGE (time);
CREATE TABLE conditions_p2 PARTITION OF conditions
FOR VALUES FROM ('g') TO ('n')
PARTITION BY RANGE (time);
CREATE TABLE conditions_p3 PARTITION OF conditions
FOR VALUES FROM ('n') TO ('t')
PARTITION BY RANGE (time);
CREATE TABLE conditions_p4 PARTITION OF conditions
FOR VALUES FROM ('t') TO (MAXVALUE)
PARTITION BY RANGE (time);
-- Create time partitions for the first week in each device partition
CREATE TABLE conditions_p1_y2017m10w01 PARTITION OF conditions_p1
FOR VALUES FROM ('2017-10-01') TO ('2017-10-07');
CREATE TABLE conditions_p2_y2017m10w01 PARTITION OF conditions_p2
FOR VALUES FROM ('2017-10-01') TO ('2017-10-07');
CREATE TABLE conditions_p3_y2017m10w01 PARTITION OF conditions_p3
FOR VALUES FROM ('2017-10-01') TO ('2017-10-07');
CREATE TABLE conditions_p4_y2017m10w01 PARTITION OF conditions_p4
FOR VALUES FROM ('2017-10-01') TO (‘2017-10-07');
-- Create time-device index on each leaf partition
CREATE INDEX ON conditions_p1_y2017m10w01 (time);
CREATE INDEX ON conditions_p2_y2017m10w01 (time);
CREATE INDEX ON conditions_p3_y2017m10w01 (time);
CREATE INDEX ON conditions_p4_y2017m10w01 (time);
INSERT INTO conditions VALUES ('2017-10-03 10:23:54+01',
73.4, 40.7, ‘sensor3');
44. Single node: Scaling up via adding disks
• Faster inserts
• Parallelized queries
How Benefit
Chunks spread across many disks (elastically!)
either RAIDed or via distinct tablespaces
46. Multi-node: Scaling out across sharded primaries
U
nderdevelopm
ent
• Chunks spread across servers
• Insert/query to any server
• Distributed query optimizations
(push-down LIMITs and aggregates, etc.)
48. SELECT time, temp FROM data
WHERE time > now() - interval ‘7 days’
AND device_id = ‘12345’
Avoid querying chunks via constraint exclusion
49. Avoid querying chunks via constraint exclusion
SELECT time, device_id, temp FROM data
WHERE time > ‘2017-08-22 18:18:00+00’
50. Avoid querying chunks via constraint exclusion
SELECT time, device_id, temp FROM data
WHERE time > now() - interval ’24 hours’
51. Additional time-based query optimizations
PG doesn’t
know to use
the index
CREATE INDEX ON readings(time);
SELECT date_trunc(‘minute’, time) as bucket,
avg(cpu)
FROM readings
GROUP BY bucket
ORDER BY bucket DESC
LIMIT 10;
Timescale
understands
time
52. Global queries but local indexes
• Constraint exclusion selects chunks globally
• Local indexes speed up queries on chunks
– B-tree, Hash, GiST, SP-GiST, GIN and BRIN
– Secondary and composite columns, UNIQUE* constraints
53. Optimized for many chunks
• Faster chunk exclusion
– Avoid opening / gather stats on all chunks during constraint exclusion:
Decreased planning on 4000 chunks from 600ms to 36ms
• Better LIMITs across chunks
– Avoid requiring one+ tuple per chunk during MergeAppend / LIMIT
54. “ We've been using TimescaleDB for over a year to
store all kinds of sensor and telemetry data as part of
our Power Management database.
We've scaled to 500 billion rows and the performance
we're seeing is monstrous, almost 70% faster queries.”
- Sean Wallace, Software Engineer
500B
ROWS
400K
ROWS / SEC
50K
CHUNKS
5min
INTERVALS
55. Efficient retention policies
SELECT time, device_id, temp FROM data
WHERE time > now() - interval ’24 hours’
Drop chunks, don’t delete rows
avoids vacuuming
60. Data Retention + Aggregations
Granularity raw 15 min day
Retention 1 week 1 month forever
61. Unlock the richness of your monitoring data
TimescaleDB
+
PostgreSQL
Prometheus
Remote Storage Adapter
+
pg_prometheus
Prometheus Grafana
62. pg_prometheus
Prometheus Data Model in TimescaleDB / PostgreSQL
CREATE TABLE metrics (sample prom_sample);
INSERT INTO metrics
VALUES (‘cpu_usage{service=“nginx”,host=“machine1”} 34.6 1494595898000’);
• Scrape metrics with CURL:
curl http://myservice/metrics | grep -v “^#” | psql -c “COPY metrics FROM STDIN”
• New data type prom_sample: <time, name, value, labels>
63. Automate normalized storage
SELECT create_prometheus_table(‘metrics’);
Time
01:02:00
01:03:00
01:04:00
01:04:00
01:04:00
Value
90
1024
70
900
70
Label
{host: “h001”}
{host: “h002”}
{host: “1984” }
{host: “super”}
{host: “marshal”}
Id
1
2
3
4
5
Label Id
1
1
2
2
5
Name
CPU
Mem
CPU
Mem
IO
Labels stored in separate host metadata table
64. Easily query auto-created view
SELECT sample
FROM metrics
WHERE time > NOW() - interval ’10 min’ AND
name = ‘cpu_usage’ AND
Labels @> ‘{“service”: “nginx”}’;
Columns: | sample | time | name | value | labels |