SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
Presenter Bios
Adam Brown -- Head of
Technology and Architecture
Co-founder of Mux with extensive
experience in video encoding and
delivery going back to Zencoder
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security.
ClickHouse is DBMS #20
Company Intros
www.altinity.com
Leading software and services
provider for ClickHouse
Major committer and community
sponsor in US and Western Europe
mux.com
Mux Video is an API-first platform,
powered by data and designed by
video experts to make beautiful video
possible for every development team.
Data and Video
Delivery
Standard Media Workflow (w/o Mux)
Standard Media Workflow
Payments before Stripe Messaging before Twilio
Payments using Stripe Messaging using Twilio
Standard Media Workflow (w/o Mux)
Mux Products
Mux Video Mux Data
Mux Data Overview
Data Sources
● Mux Data SDKS
● CDN Logs
● Internal Monitoring
SDKs
Example Use Case
15
Data drives video engine
ClickHouse
features that
enable Mux
Introducing the MergeTree table engine
CREATE TABLE ontime (
Year UInt16,
Quarter UInt8,
Month UInt8,
...
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(FlightDate)
ORDER BY (Carrier, FlightDate)
Table engine type
How to break data
into parts
How to index and
sort data in each part
MergeTree layout within a single part
/var/lib/clickhouse/data/airline/ontime_reordered
2017-01-01 AA
2017-01-01 EV
2018-01-01 UA
2018-01-02 AA
...
primary.idx
||||
.mrk2 .bin
20170701_20170731_355_355_2/
(FlightDate, Carrier...) ActualElapsedTime Airline AirlineID...
||||
.mrk2 .bin
||||
.mrk2 .bin
Granule
Compressed
BlockMark
Compression and codecs are configurable
CREATE TABLE test_codecs (
a_lz4 String CODEC(LZ4),
a_zstd String DEFAULT a_lz4 CODEC(ZSTD),
a_lc_lz4 LowCardinality(String) DEFAULT a_lz4 CODEC(LZ4),
a_lc_zstd LowCardinality(String) DEFAULT a_lz4 CODEC(ZSTD)
)
Engine = MergeTree
PARTITION BY tuple() ORDER BY tuple();
Effect on storage size is dramatic
20.84% 12.28%
10.61%
10.65%
7.89%
How do you get those nice numbers?
SELECT name AS col,
sum(data_uncompressed_bytes) AS uncompressed,
sum(data_compressed_bytes) AS compressed,
round((compressed / uncompressed) * 100., 2) AS pct,
bar(pct, 0., 100., 20) AS bar_graph
FROM system.columns
WHERE (database = currentDatabase()) AND (table = 'test_codecs')
GROUP BY name ORDER BY pct DESC, name ASC
┌─col───────┬─uncompressed─┬─compressed─┬────pct─┬─bar_graph────────────┐
│ a_lc_lz4 │ 200446439 │ 201154656 │ 100.35 │ ████████████████████ │
│ a_lc_zstd │ 200446439 │ 148996404 │ 74.33 │ ██████████████▋ │
│ a_lz4 │ 1889003550 │ 393713202 │ 20.84 │ ████ │
│ a_zstd │ 1889003550 │ 231975292 │ 12.28 │ ██▍ │
└───────────┴──────────────┴────────────┴────────┴──────────────────────┘
Materialized views restructure/reduce data
cpu
Table
Ingest
All CPU measurements Last measurement by CPU
cpu_last_point_agg
SummingMergeTree
(Trigger)
Compressed: ~0.0009%
Uncompressed: ~0.002%
cpu_last_point_mv
Materialized View
CREATE MATERIALIZED VIEW cpu_last_point_mv
TO cpu_last_point_agg
AS SELECT
cpu_id,
maxState(created_at) AS max_created_at,
. . .
argMaxState(usage_idle, created_at) AS usage_idle
FROM cpu GROUP BY cpu_id
Pattern: TTLs + downsampled views
CREATE TABLE web_visits (
time DateTime CODEC(DoubleDelta,
ZSTD),
session_id UInt64,
visitor_id UInt64, . . .
) Engine = MergeTree
PARTITION BY toDate(time)
ORDER BY (session_id, time)
TTL time + INTERVAL 7 DAY
web_visitsIngest hourly_uniq_visits
hourly_sessions
Ephemeral source data Long-term aggregates
SET allow_experimental_data_skipping_indices=1;
ALTER TABLE ontime ADD INDEX
dest_name Dest TYPE ngrambf_v1(3, 512, 2, 0) GRANULARITY 1
ALTER TABLE ontime ADD INDEX
cname Carrier TYPE set(0) GRANULARITY 1
OPTIMIZE TABLE ontime FINAL
-- OR, in current releases
ALTER TABLE ontime
UPDATE Dest=Dest, Carrier=Carrier
WHERE 1=1
Skip indexes cut down on I/O
Default value
Effectiveness depends on data distribution
SELECT
Year, count(*) AS flights,
sum(Cancelled) / flights AS cancelled,
sum(DepDel15) / flights AS delayed_15
FROM airline.ontime WHERE [Column] = [Value] GROUP BY Year
Column Value Index Count Rows Processed Query Response
Dest PPG ngrambf_v1 525 4.30M 0.053
Dest ATL ngrambf_v1 9,360,581 166.81M 0.622
Carrier ML set 70,622 3.39M 0.090
Carrier WN set 25,918,402 166.24M 0.566
More table engines: CollapsingMergeTree
CREATE TABLE collapse (user_id UInt64, views UInt64, sign Int8)
ENGINE = CollapsingMergeTree(sign)
PARTITION BY tuple() ORDER BY user_id;
INSERT INTO collapse VALUES (32, 55, 1);
INSERT INTO collapse VALUES (32, 55, -1);
INSERT INTO collapse VALUES (32, 98, 1);
SELECT * FROM collapse FINAL
┌─user_id─┬─views─┬─sign─┐
│ 32 │ 98 │ 1 │
└─────────┴───────┴──────┘
Mux Experience
with ClickHouse
Mux Data Overview
Video Views
● Full Video Views
● Billions of Views/Month
● 500M Views/Customer
● 100K Beacons/Second
● Raw Data Queries
Historical Metrics Clickhouse
Data Architecture
Citus
Fancy Citus
Super Fancy Citus
2016
2019
Mux Before Clickhouse
● Filter Depth
● Exclusion Filters
● Dynamic time aggregation
● Scale
Unlocked Data Features
Unlocked Performance
Cost Benefits
● No Aggregation (CPU + Disk)
● Columnar compression
○ Low Cardinality
○ ⅓ Disk Size
● Smaller machines
● Halved costs while volume doubled
Challenges
● Clickhouse != Postgres
● View Updates
● Individual record lookup
Updates
CREATE TABLE views (
view_id UUID,
customer_id UUID,
user_id UUID,
view_time DateTime,
...
rebuffer_count: UInt64,
--- other metrics
operating_system: Nullable(String),
--- other filter dimensions
)
ReplicatedCollapsingMergeTree(...,Sign)
PARTITION BY toYYYYMMDD(view_time)
ORDER BY (customer_id, view_time, view_id)
● “Resumed” views
● Low percentage of rows get updated
● FINAL keyword sometimes
○ Yes when listing views
○ No on larger metrics queries
○ Metrics workarounds
■ SUM(metric * Sign) /
SUM(Sign)
● Nightly OPTIMIZE
Record Lookup
SELECT * FROM video_views
WHERE
view_time IN (
SELECT view_time FROM user_id_index
WHERE customer_id=0 AND user_id='mux'
)
AND customer_id=0 AND user_id='mux'
CREATE MATERIALIZED VIEW
video_views_index_user_id
ON CLUSTER metrics
ENGINE = ReplicatedCollapsingMergeTree(...,
Sign)
PARTITION BY toYYYYMMDD(view_end)
PRIMARY KEY (property_id, user_id)
ORDER BY (property_id, user_id, view_end)
POPULATE AS
SELECT
property_id,
user_id,
view_end,
Sign
FROM video_views
https://mux.com/blog/from-russia-with-love-how-clickhouse-saved-our-data/
Nullable
Clickhouse Documentation
● Test it
● We use Nullable extensively
● Minimal Performance Impact
Deployment Details
● K8s
● 4 Clusters/4 Deployments
○ Historical Metrics
■ CH Replication
■ Secondary Cluster
■ 8 Node Clusters
○ Realtime Metrics
■ No Replication
■ Blue/Green Clusters
■ 5 Node Clusters
○ CDN logs
■ No Replication
■ Single Node
■ Increased Kafka retention
○ Raw Data Beacons
■ No Replication
■ 3 Day TTL
● All fronted by chproxy
○ https://github.com/Vertamedia/chproxy
○ Caching
○ User Routing
○ Rate Limiting
○ Prometheus Monitoring
What Next?
● Advanced Alerting
● BI Metrics
● Data Warehousing
● Move “beacon processing” into clickhouse
Wrap-up
Takeaways
● Clickhouse performance feels like magic
● Operational simplicity, especially around scaling
● Clickhouse has become our default for statistic data
Thank you!
We are both
hiring!
Mux:
https://mux.com
Altinity:
https://www.altinity.com

Contenu connexe

Tendances

Tendances (20)

ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...
 
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
 
Creating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouseCreating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouse
 
Tiered storage intro. By Robert Hodges, Altinity CEO
Tiered storage intro. By Robert Hodges, Altinity CEOTiered storage intro. By Robert Hodges, Altinity CEO
Tiered storage intro. By Robert Hodges, Altinity CEO
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
 
ClickHouse Defense Against the Dark Arts - Intro to Security and Privacy
ClickHouse Defense Against the Dark Arts - Intro to Security and PrivacyClickHouse Defense Against the Dark Arts - Intro to Security and Privacy
ClickHouse Defense Against the Dark Arts - Intro to Security and Privacy
 
Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
Webinar slides: Adding Fast Analytics to MySQL Applications with ClickhouseWebinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic Continues
 
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 Adventures in Observability: How in-house ClickHouse deployment enabled Inst... Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Five Great Ways to Lose Data on Kubernetes - KubeCon EU 2020
Five Great Ways to Lose Data on Kubernetes - KubeCon EU 2020Five Great Ways to Lose Data on Kubernetes - KubeCon EU 2020
Five Great Ways to Lose Data on Kubernetes - KubeCon EU 2020
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 

Similaire à Big Data and Beautiful Video: How ClickHouse enables Mux to Deliver Content at Scale

Similaire à Big Data and Beautiful Video: How ClickHouse enables Mux to Deliver Content at Scale (20)

Sprint 72
Sprint 72Sprint 72
Sprint 72
 
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdf
 
Activity Recognition project
Activity Recognition projectActivity Recognition project
Activity Recognition project
 
Sprint 88
Sprint 88Sprint 88
Sprint 88
 
Computer Architecture and Organization
Computer Architecture and OrganizationComputer Architecture and Organization
Computer Architecture and Organization
 
Sprint 53
Sprint 53Sprint 53
Sprint 53
 
Apache Druid Design and Future prospect
Apache Druid Design and Future prospectApache Druid Design and Future prospect
Apache Druid Design and Future prospect
 
Bitmovin LIVE Tech Talks: Analytics for Workflow Automation (ft. Touchstream ...
Bitmovin LIVE Tech Talks: Analytics for Workflow Automation (ft. Touchstream ...Bitmovin LIVE Tech Talks: Analytics for Workflow Automation (ft. Touchstream ...
Bitmovin LIVE Tech Talks: Analytics for Workflow Automation (ft. Touchstream ...
 
Sprint 73
Sprint 73Sprint 73
Sprint 73
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
 
Sprint 31
Sprint 31Sprint 31
Sprint 31
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Activity Recognition
Activity RecognitionActivity Recognition
Activity Recognition
 
Sprint 50 review
Sprint 50 reviewSprint 50 review
Sprint 50 review
 
Sprint 60
Sprint 60Sprint 60
Sprint 60
 
AltiGen Cdr Manual
AltiGen Cdr ManualAltiGen Cdr Manual
AltiGen Cdr Manual
 
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
 
Microsoft Code Signing Certificate Best Practice - CodeSignCert.com
Microsoft Code Signing Certificate Best Practice - CodeSignCert.comMicrosoft Code Signing Certificate Best Practice - CodeSignCert.com
Microsoft Code Signing Certificate Best Practice - CodeSignCert.com
 
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
 
Nvidia® cuda™ 5 sample evaluationresult_2
Nvidia® cuda™ 5 sample evaluationresult_2Nvidia® cuda™ 5 sample evaluationresult_2
Nvidia® cuda™ 5 sample evaluationresult_2
 

Plus de Altinity Ltd

Plus de Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Big Data and Beautiful Video: How ClickHouse enables Mux to Deliver Content at Scale

  • 1.
  • 2. Presenter Bios Adam Brown -- Head of Technology and Architecture Co-founder of Mux with extensive experience in video encoding and delivery going back to Zencoder Robert Hodges - Altinity CEO 30+ years on DBMS plus virtualization and security. ClickHouse is DBMS #20
  • 3. Company Intros www.altinity.com Leading software and services provider for ClickHouse Major committer and community sponsor in US and Western Europe mux.com Mux Video is an API-first platform, powered by data and designed by video experts to make beautiful video possible for every development team.
  • 7. Payments before Stripe Messaging before Twilio
  • 8. Payments using Stripe Messaging using Twilio
  • 10.
  • 13. Data Sources ● Mux Data SDKS ● CDN Logs ● Internal Monitoring SDKs
  • 17. Introducing the MergeTree table engine CREATE TABLE ontime ( Year UInt16, Quarter UInt8, Month UInt8, ... ) ENGINE = MergeTree() PARTITION BY toYYYYMM(FlightDate) ORDER BY (Carrier, FlightDate) Table engine type How to break data into parts How to index and sort data in each part
  • 18. MergeTree layout within a single part /var/lib/clickhouse/data/airline/ontime_reordered 2017-01-01 AA 2017-01-01 EV 2018-01-01 UA 2018-01-02 AA ... primary.idx |||| .mrk2 .bin 20170701_20170731_355_355_2/ (FlightDate, Carrier...) ActualElapsedTime Airline AirlineID... |||| .mrk2 .bin |||| .mrk2 .bin Granule Compressed BlockMark
  • 19. Compression and codecs are configurable CREATE TABLE test_codecs ( a_lz4 String CODEC(LZ4), a_zstd String DEFAULT a_lz4 CODEC(ZSTD), a_lc_lz4 LowCardinality(String) DEFAULT a_lz4 CODEC(LZ4), a_lc_zstd LowCardinality(String) DEFAULT a_lz4 CODEC(ZSTD) ) Engine = MergeTree PARTITION BY tuple() ORDER BY tuple();
  • 20. Effect on storage size is dramatic 20.84% 12.28% 10.61% 10.65% 7.89%
  • 21. How do you get those nice numbers? SELECT name AS col, sum(data_uncompressed_bytes) AS uncompressed, sum(data_compressed_bytes) AS compressed, round((compressed / uncompressed) * 100., 2) AS pct, bar(pct, 0., 100., 20) AS bar_graph FROM system.columns WHERE (database = currentDatabase()) AND (table = 'test_codecs') GROUP BY name ORDER BY pct DESC, name ASC ┌─col───────┬─uncompressed─┬─compressed─┬────pct─┬─bar_graph────────────┐ │ a_lc_lz4 │ 200446439 │ 201154656 │ 100.35 │ ████████████████████ │ │ a_lc_zstd │ 200446439 │ 148996404 │ 74.33 │ ██████████████▋ │ │ a_lz4 │ 1889003550 │ 393713202 │ 20.84 │ ████ │ │ a_zstd │ 1889003550 │ 231975292 │ 12.28 │ ██▍ │ └───────────┴──────────────┴────────────┴────────┴──────────────────────┘
  • 22. Materialized views restructure/reduce data cpu Table Ingest All CPU measurements Last measurement by CPU cpu_last_point_agg SummingMergeTree (Trigger) Compressed: ~0.0009% Uncompressed: ~0.002% cpu_last_point_mv Materialized View CREATE MATERIALIZED VIEW cpu_last_point_mv TO cpu_last_point_agg AS SELECT cpu_id, maxState(created_at) AS max_created_at, . . . argMaxState(usage_idle, created_at) AS usage_idle FROM cpu GROUP BY cpu_id
  • 23. Pattern: TTLs + downsampled views CREATE TABLE web_visits ( time DateTime CODEC(DoubleDelta, ZSTD), session_id UInt64, visitor_id UInt64, . . . ) Engine = MergeTree PARTITION BY toDate(time) ORDER BY (session_id, time) TTL time + INTERVAL 7 DAY web_visitsIngest hourly_uniq_visits hourly_sessions Ephemeral source data Long-term aggregates
  • 24. SET allow_experimental_data_skipping_indices=1; ALTER TABLE ontime ADD INDEX dest_name Dest TYPE ngrambf_v1(3, 512, 2, 0) GRANULARITY 1 ALTER TABLE ontime ADD INDEX cname Carrier TYPE set(0) GRANULARITY 1 OPTIMIZE TABLE ontime FINAL -- OR, in current releases ALTER TABLE ontime UPDATE Dest=Dest, Carrier=Carrier WHERE 1=1 Skip indexes cut down on I/O Default value
  • 25. Effectiveness depends on data distribution SELECT Year, count(*) AS flights, sum(Cancelled) / flights AS cancelled, sum(DepDel15) / flights AS delayed_15 FROM airline.ontime WHERE [Column] = [Value] GROUP BY Year Column Value Index Count Rows Processed Query Response Dest PPG ngrambf_v1 525 4.30M 0.053 Dest ATL ngrambf_v1 9,360,581 166.81M 0.622 Carrier ML set 70,622 3.39M 0.090 Carrier WN set 25,918,402 166.24M 0.566
  • 26. More table engines: CollapsingMergeTree CREATE TABLE collapse (user_id UInt64, views UInt64, sign Int8) ENGINE = CollapsingMergeTree(sign) PARTITION BY tuple() ORDER BY user_id; INSERT INTO collapse VALUES (32, 55, 1); INSERT INTO collapse VALUES (32, 55, -1); INSERT INTO collapse VALUES (32, 98, 1); SELECT * FROM collapse FINAL ┌─user_id─┬─views─┬─sign─┐ │ 32 │ 98 │ 1 │ └─────────┴───────┴──────┘
  • 30. ● Full Video Views ● Billions of Views/Month ● 500M Views/Customer ● 100K Beacons/Second ● Raw Data Queries Historical Metrics Clickhouse
  • 32. Citus Fancy Citus Super Fancy Citus 2016 2019 Mux Before Clickhouse
  • 33. ● Filter Depth ● Exclusion Filters ● Dynamic time aggregation ● Scale Unlocked Data Features
  • 35. Cost Benefits ● No Aggregation (CPU + Disk) ● Columnar compression ○ Low Cardinality ○ ⅓ Disk Size ● Smaller machines ● Halved costs while volume doubled
  • 36. Challenges ● Clickhouse != Postgres ● View Updates ● Individual record lookup
  • 37. Updates CREATE TABLE views ( view_id UUID, customer_id UUID, user_id UUID, view_time DateTime, ... rebuffer_count: UInt64, --- other metrics operating_system: Nullable(String), --- other filter dimensions ) ReplicatedCollapsingMergeTree(...,Sign) PARTITION BY toYYYYMMDD(view_time) ORDER BY (customer_id, view_time, view_id) ● “Resumed” views ● Low percentage of rows get updated ● FINAL keyword sometimes ○ Yes when listing views ○ No on larger metrics queries ○ Metrics workarounds ■ SUM(metric * Sign) / SUM(Sign) ● Nightly OPTIMIZE
  • 38. Record Lookup SELECT * FROM video_views WHERE view_time IN ( SELECT view_time FROM user_id_index WHERE customer_id=0 AND user_id='mux' ) AND customer_id=0 AND user_id='mux' CREATE MATERIALIZED VIEW video_views_index_user_id ON CLUSTER metrics ENGINE = ReplicatedCollapsingMergeTree(..., Sign) PARTITION BY toYYYYMMDD(view_end) PRIMARY KEY (property_id, user_id) ORDER BY (property_id, user_id, view_end) POPULATE AS SELECT property_id, user_id, view_end, Sign FROM video_views https://mux.com/blog/from-russia-with-love-how-clickhouse-saved-our-data/
  • 39. Nullable Clickhouse Documentation ● Test it ● We use Nullable extensively ● Minimal Performance Impact
  • 40. Deployment Details ● K8s ● 4 Clusters/4 Deployments ○ Historical Metrics ■ CH Replication ■ Secondary Cluster ■ 8 Node Clusters ○ Realtime Metrics ■ No Replication ■ Blue/Green Clusters ■ 5 Node Clusters ○ CDN logs ■ No Replication ■ Single Node ■ Increased Kafka retention ○ Raw Data Beacons ■ No Replication ■ 3 Day TTL ● All fronted by chproxy ○ https://github.com/Vertamedia/chproxy ○ Caching ○ User Routing ○ Rate Limiting ○ Prometheus Monitoring
  • 41. What Next? ● Advanced Alerting ● BI Metrics ● Data Warehousing ● Move “beacon processing” into clickhouse
  • 43. Takeaways ● Clickhouse performance feels like magic ● Operational simplicity, especially around scaling ● Clickhouse has become our default for statistic data
  • 44. Thank you! We are both hiring! Mux: https://mux.com Altinity: https://www.altinity.com