7. Big Data nature
1.Time series data: date, numeric metrics
• Append-only mostly
• Huge volume of inserts
2.Analytical data: date, dimensions, measures
• A lot of dimensions
• Dimensions are subjects for update
• Moderate ingestion speed
8. Big Data data models
1.Star-style normalized
2.Wide tables
3.Nested structures
10. Postgres
1.Classical RDBMS
2.Very good for OLTP
3.Our experience – max DB size 5 TB
4.Partitioning is fine from PG 12
5.Row-based database
6.Search conditions predefined
14. Citus
1.Postgres extension
2.Owned by Microsoft
3.MPP tailored for OLTP – sharding for fast ingestion
4.Vendor states for OLAP as well – not true
5.SQL is perfect to extract data by shard key
15. Greenplum
1.Postgres fork
2.OLAP workload optimized MPP database
3.Perfect partitioning with automatic data lifecycle management
4.A lot of compression algorithms
5.Up to 100 TB clusters very fast
6.A lot of integrations: files, HDFS
7.Perfect SQL support – perfect for analytical queries
16. Big data platforms
1.Set of opensource components
2.Components for storage, processing, management
3.Storage components different by tiers:
- in-memory for hot
- MPP for warm
- Hadoop for cold
18. Clickhouse
1. DBMS written from scratch
2. Written by developers for developers – Yandex.Metrica
3. Wide table optimized MPP columnar database
4. Extremely fast batch ingestion + InMemory buffer tables
5. Very fast queries for wide tables
6.Aggregations during ingestion + real-time materialized views
7. Hundreds TB clusters
8. Huge number of integrations: files, HDFS, Kafka, JDBC
9. Self-sufficient for ELT/ETL
10. Own SQL dialect