SlideShare a Scribd company logo
1 of 35
Download to read offline
Our Story w/ Clickhouse @seo.do
Metehan Çetinkaya
What is seo.do?
What is seo.do?
Yandex Metrica for SEO professionals.
50.000 keywords
50.000 keywords
50.000 x 2 x 100 x 365
50.000 keywords
50.000 x 2 x 100 x 365
3.6 billion rows per client
How did our use case comply with ClickHouse?
● Continuous data insertion
○ Since there is no lock implementation in ClickHouse, inserting data to the database does not
affect query performance
● No updates / deletes necessary
○ We don't have a use case for UPDATE & DELETE operations, to delete obsolete data we
simply use PARTITION and it's operations
■ You can still update the data with using ALTER mutations
● Billions of rows to be processed
○ Time series data which will be aggregated constantly to create reports etc.
● SQL dialect close to Standard SQL
How did our use case comply with ClickHouse?
● Very handy built-in functions
○ avgIf, sumIf, countIf, order by if, date conversion functions (toWeek, toMonth etc.), splitByChar
■ ‘ SELECT fullName, age, avgIf(wage, age BETWEEN 18 and 25) as youthWageAvg
from people where country_id = 745261’
● High compression ratio
○ Compression ratio totally depends on the data and since our data is timeseries and there are
repeated strings, compression ratio for us is around 1:15
● Continuous development and great community
How did we design our table schemas?
Understanding ClickHouse Index Structure
● Index structure is not similar to traditional RDBMS's index structure, no
B+Tree, does not create unique constraint
● Data is physically sorted on disk
○ There is a background job to sort and merge the data and it will take place eventually
● Need to choose primary key / sorting key by considering query conditions
○ To keep the reads at minimum, need to consider all possible queries with conditions and then
choose the primary key
How did we design our table schemas?
What Did We Do?
● Created initial table schemas on single server by considering our data
structure and possible queries
● Inserted billions of dummy data which completely simulated our actual case
○ Since the data we will have is time series, we created the actual data for one day, and
replicate it for 180 days
● Wrote sample queries, few queries for each dimension we would query the
data
Refactoring The Schemas & Writing Queries
● Ran single query each time and checked the results
○ Tip : Use ' tail -f /var/log/clickhouse-server/clickhouse-server.log ' to see execution logs of the
query (Peak MEM Usage, Threads executed, Marks read) that was executed. Or simply add
'send_logs_level=trace' when connecting to the cli ' clickhouse-client --send_logs_level=trace '
● Refactored the table schemas or even created new table schemas with
considering query results
● After satisfying results, we executed bunch of load tests with the sample
queries
Using JOIN in ClickHouse
● Since ClickHouse’s compression ratio is very high, compromised from storage
to boost the performance
○ How so? Did not force the tables to be atomic, to keep the relations & JOINs at minimum, let
there be duplicated data. It’s a trade off.
● Tried to avoid JOINs however with our data structure it was not possible, so
we kept JOINs at minimum
● Avoided using raw JOINs all the time
● Used JOINs with subqueries
Raw JOIN
select keyword, group_id from keyword_data as ss
INNER JOIN
keyword_info as kw on ss.keyword = kw.keyword
and ss.cid = kw.cid
and ss.cid = 149315
and ss.position = 5
and toDate(ss.timestamp) = toDate('2019-10-20')
Raw JOIN
● Took 5 seconds to complete
○ Slow and keeping system resources busy for long (This is important because ClickHouse can
fully utilize system resources under load, average query response time will be high)
● Processed 369.74 million rows
● Executed the query with 8 threads
○ Executed the query on a server which had 16 cores. By default ClickHouse sets max_threads
setting to half the number of physical core count, so this query utilized all the cores available
to ClickHouse
● Peak Memory Usage : 90 Megabytes
JOIN With Sub Query
select keyword, group_id from
(select keyword from keyword_data PREWHERE position = 5 where cid = 149315
and toDate(timestamp) = '2019-10-20') as ss
INNER JOIN
(select keyword, group_id from keyword_info where cid = 149315) as kw
on ss.keyword = kw.keyword
JOIN With Sub Query
● Took 100 milliseconds
○ 50 times faster than raw join and releases resources quickly
● Processed 65.54 thousand rows
● Executed the query with 2 threads
○ Was able to execute the query with 6 less threads. It is very important for us to keep thread
count at minimum for each query since our QPS rate will be high. If we were to ignore this, we
would have to solve our performance problems with new replicas in the future which means
new servers and constant money spend
● Peak Memory Usage : 5 Megabytes
Creating ClickHouse Cluster
● Replication
○ High Availability
○ Load Scaling
● Sharding
○ Data size
○ Split data into smaller parts
Creating ClickHouse Cluster
● Created a cluster with 6 servers (2 shards and 3 replicas)
● Set up ZooKeeper with 3 additional servers
○ Since latency is a critical point for ZooKeeper and ClickHouse can utilize all available system
resources we don’t run ZooKeeper on the same server with ClickHouse (ZooKeeper Cluster
with 3 servers can handle failure of 1 server)
● Used Clickhouse's internal load balancer mechanism to distribute queries
over the replicas
○ HAProxy or CHProxy could be used as separate load balancer
Chaos Testing
● Cluster is set up & running
● Developed a dummy API which will execute random queries with Golang Gin
○ This is not a performance test so we just went with the fastest way for us
● Created a basic load test which will make requests to our Go API
continuously, with this way we will be able to monitor ClickHouse behavior
with failures
Without Data Loss
● Initiated the load test
● Killed the ClickHouse in a random node during the load test
○ We are actually trying to simulate ClickHouse failure & temporary server crash in this scenario
● Monitored ClickHouse's load balancer's behavior and system's load
● Restarted the ClickHouse in that server
○ Load test is still being executed in the meantime
What Happened?
● Since the ClickHouse Server in chosen shard was unreachable, ClickHouse
load balancer stopped making requests to that node
● Queries started to be distributed over the remaining 2 replicas
● After restarting the Clickhouse Server, chosen node did not get any requests
for another 10 minutes then it started to receive requests
○ Because ClickHouse load balancer distributes queries with considering error counts
○ Error count is halved each minute
○ Maximum error count is 1000 by default
With Data Loss
● Chose an another shard randomly
● This time we didn’t just kill the ClickHouse Server, we formatted disks as well.
● CH configuration lost, all the data is lost
● Chosen node is still in ZooKeeper config
● Reconfigured the server and Reinstalled the CH
● Copied metadata from a running replica
● Monitored ClickHouse behavior and system's load
What Happened?
● Like in the previous case, chosen node did not get any requests
● After configuration of the server is completed, replication took place
● Like in the other scenario, chosen node started getting requests after 10
minutes
Deciding the API Framework
● Developed API endpoints which execute sample queries with Flask, Golang
Gin and FastAPI
● Initiated the load tests for three of them separately (10K requests per minute,
ran for 20 minutes)
● Monitored the results
Deciding the API Framework
● Flask
○ Was not able to handle all the requests so after some time errors started to raise
○ Average response time was 3 seconds and %10 of the incoming requests resulted with errors
● Gin
○ Was able to handle all the requests without errors
○ Average response time was 350 milliseconds and there was no error at all
● FastAPI
○ Was able to handle all the requests without errors as well
○ Average response time was 300 milliseconds without errors

More Related Content

What's hot

ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouserpolat
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...Altinity Ltd
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseAltinity Ltd
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Ltd
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOAltinity Ltd
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Altinity Ltd
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseAltinity Ltd
 
How to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyHow to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyVMware Tanzu
 
Prometheus Project Journey
Prometheus Project JourneyPrometheus Project Journey
Prometheus Project JourneyJinwoong Kim
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Altinity Ltd
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLJonathan Katz
 
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaFast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaAltinity Ltd
 

What's hot (20)

ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
ClickHouse Keeper
ClickHouse KeeperClickHouse Keeper
ClickHouse Keeper
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
How to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyHow to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor Netty
 
Prometheus Project Journey
Prometheus Project JourneyPrometheus Project Journey
Prometheus Project Journey
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
 
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaFast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
 

Similar to Our Story With ClickHouse at seo.do

Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedShubham Tagra
 
Enabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedEnabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedShubham Tagra
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django applicationbangaloredjangousergroup
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsBeyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsDataWorks Summit
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesShivji Kumar Jha
 
SOLR Power FTW: short version
SOLR Power FTW: short versionSOLR Power FTW: short version
SOLR Power FTW: short versionAlex Pinkin
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioAlluxio, Inc.
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3LibbySchulze
 
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Frank Kelly
 
Streaming Millions of Contact Center Interactions in (Near) Real-Time with Pu...
Streaming Millions of Contact Center Interactions in (Near) Real-Time with Pu...Streaming Millions of Contact Center Interactions in (Near) Real-Time with Pu...
Streaming Millions of Contact Center Interactions in (Near) Real-Time with Pu...StreamNative
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...Anna Ossowski
 
Debugging data pipelines @OLA by Karan Kumar
Debugging data pipelines @OLA by Karan KumarDebugging data pipelines @OLA by Karan Kumar
Debugging data pipelines @OLA by Karan KumarShubham Tagra
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
Evolution of DBA in the Cloud Era
 Evolution of DBA in the Cloud Era Evolution of DBA in the Cloud Era
Evolution of DBA in the Cloud EraMydbops
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesEd Hunter
 
PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesInMobi Technology
 

Similar to Our Story With ClickHouse at seo.do (20)

Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speed
 
Enabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedEnabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speed
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsBeyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
 
SOLR Power FTW: short version
SOLR Power FTW: short versionSOLR Power FTW: short version
SOLR Power FTW: short version
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...
 
Streaming Millions of Contact Center Interactions in (Near) Real-Time with Pu...
Streaming Millions of Contact Center Interactions in (Near) Real-Time with Pu...Streaming Millions of Contact Center Interactions in (Near) Real-Time with Pu...
Streaming Millions of Contact Center Interactions in (Near) Real-Time with Pu...
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
Debugging data pipelines @OLA by Karan Kumar
Debugging data pipelines @OLA by Karan KumarDebugging data pipelines @OLA by Karan Kumar
Debugging data pipelines @OLA by Karan Kumar
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Evolution of DBA in the Cloud Era
 Evolution of DBA in the Cloud Era Evolution of DBA in the Cloud Era
Evolution of DBA in the Cloud Era
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major Features
 

Recently uploaded

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 

Recently uploaded (20)

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 

Our Story With ClickHouse at seo.do

  • 1. Our Story w/ Clickhouse @seo.do Metehan Çetinkaya
  • 3. What is seo.do? Yandex Metrica for SEO professionals.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 15. 50.000 keywords 50.000 x 2 x 100 x 365
  • 16. 50.000 keywords 50.000 x 2 x 100 x 365 3.6 billion rows per client
  • 17. How did our use case comply with ClickHouse? ● Continuous data insertion ○ Since there is no lock implementation in ClickHouse, inserting data to the database does not affect query performance ● No updates / deletes necessary ○ We don't have a use case for UPDATE & DELETE operations, to delete obsolete data we simply use PARTITION and it's operations ■ You can still update the data with using ALTER mutations ● Billions of rows to be processed ○ Time series data which will be aggregated constantly to create reports etc. ● SQL dialect close to Standard SQL
  • 18. How did our use case comply with ClickHouse? ● Very handy built-in functions ○ avgIf, sumIf, countIf, order by if, date conversion functions (toWeek, toMonth etc.), splitByChar ■ ‘ SELECT fullName, age, avgIf(wage, age BETWEEN 18 and 25) as youthWageAvg from people where country_id = 745261’ ● High compression ratio ○ Compression ratio totally depends on the data and since our data is timeseries and there are repeated strings, compression ratio for us is around 1:15 ● Continuous development and great community
  • 19. How did we design our table schemas? Understanding ClickHouse Index Structure ● Index structure is not similar to traditional RDBMS's index structure, no B+Tree, does not create unique constraint ● Data is physically sorted on disk ○ There is a background job to sort and merge the data and it will take place eventually ● Need to choose primary key / sorting key by considering query conditions ○ To keep the reads at minimum, need to consider all possible queries with conditions and then choose the primary key
  • 20. How did we design our table schemas? What Did We Do? ● Created initial table schemas on single server by considering our data structure and possible queries ● Inserted billions of dummy data which completely simulated our actual case ○ Since the data we will have is time series, we created the actual data for one day, and replicate it for 180 days ● Wrote sample queries, few queries for each dimension we would query the data
  • 21. Refactoring The Schemas & Writing Queries ● Ran single query each time and checked the results ○ Tip : Use ' tail -f /var/log/clickhouse-server/clickhouse-server.log ' to see execution logs of the query (Peak MEM Usage, Threads executed, Marks read) that was executed. Or simply add 'send_logs_level=trace' when connecting to the cli ' clickhouse-client --send_logs_level=trace ' ● Refactored the table schemas or even created new table schemas with considering query results ● After satisfying results, we executed bunch of load tests with the sample queries
  • 22. Using JOIN in ClickHouse ● Since ClickHouse’s compression ratio is very high, compromised from storage to boost the performance ○ How so? Did not force the tables to be atomic, to keep the relations & JOINs at minimum, let there be duplicated data. It’s a trade off. ● Tried to avoid JOINs however with our data structure it was not possible, so we kept JOINs at minimum ● Avoided using raw JOINs all the time ● Used JOINs with subqueries
  • 23. Raw JOIN select keyword, group_id from keyword_data as ss INNER JOIN keyword_info as kw on ss.keyword = kw.keyword and ss.cid = kw.cid and ss.cid = 149315 and ss.position = 5 and toDate(ss.timestamp) = toDate('2019-10-20')
  • 24. Raw JOIN ● Took 5 seconds to complete ○ Slow and keeping system resources busy for long (This is important because ClickHouse can fully utilize system resources under load, average query response time will be high) ● Processed 369.74 million rows ● Executed the query with 8 threads ○ Executed the query on a server which had 16 cores. By default ClickHouse sets max_threads setting to half the number of physical core count, so this query utilized all the cores available to ClickHouse ● Peak Memory Usage : 90 Megabytes
  • 25. JOIN With Sub Query select keyword, group_id from (select keyword from keyword_data PREWHERE position = 5 where cid = 149315 and toDate(timestamp) = '2019-10-20') as ss INNER JOIN (select keyword, group_id from keyword_info where cid = 149315) as kw on ss.keyword = kw.keyword
  • 26. JOIN With Sub Query ● Took 100 milliseconds ○ 50 times faster than raw join and releases resources quickly ● Processed 65.54 thousand rows ● Executed the query with 2 threads ○ Was able to execute the query with 6 less threads. It is very important for us to keep thread count at minimum for each query since our QPS rate will be high. If we were to ignore this, we would have to solve our performance problems with new replicas in the future which means new servers and constant money spend ● Peak Memory Usage : 5 Megabytes
  • 27. Creating ClickHouse Cluster ● Replication ○ High Availability ○ Load Scaling ● Sharding ○ Data size ○ Split data into smaller parts
  • 28. Creating ClickHouse Cluster ● Created a cluster with 6 servers (2 shards and 3 replicas) ● Set up ZooKeeper with 3 additional servers ○ Since latency is a critical point for ZooKeeper and ClickHouse can utilize all available system resources we don’t run ZooKeeper on the same server with ClickHouse (ZooKeeper Cluster with 3 servers can handle failure of 1 server) ● Used Clickhouse's internal load balancer mechanism to distribute queries over the replicas ○ HAProxy or CHProxy could be used as separate load balancer
  • 29. Chaos Testing ● Cluster is set up & running ● Developed a dummy API which will execute random queries with Golang Gin ○ This is not a performance test so we just went with the fastest way for us ● Created a basic load test which will make requests to our Go API continuously, with this way we will be able to monitor ClickHouse behavior with failures
  • 30. Without Data Loss ● Initiated the load test ● Killed the ClickHouse in a random node during the load test ○ We are actually trying to simulate ClickHouse failure & temporary server crash in this scenario ● Monitored ClickHouse's load balancer's behavior and system's load ● Restarted the ClickHouse in that server ○ Load test is still being executed in the meantime
  • 31. What Happened? ● Since the ClickHouse Server in chosen shard was unreachable, ClickHouse load balancer stopped making requests to that node ● Queries started to be distributed over the remaining 2 replicas ● After restarting the Clickhouse Server, chosen node did not get any requests for another 10 minutes then it started to receive requests ○ Because ClickHouse load balancer distributes queries with considering error counts ○ Error count is halved each minute ○ Maximum error count is 1000 by default
  • 32. With Data Loss ● Chose an another shard randomly ● This time we didn’t just kill the ClickHouse Server, we formatted disks as well. ● CH configuration lost, all the data is lost ● Chosen node is still in ZooKeeper config ● Reconfigured the server and Reinstalled the CH ● Copied metadata from a running replica ● Monitored ClickHouse behavior and system's load
  • 33. What Happened? ● Like in the previous case, chosen node did not get any requests ● After configuration of the server is completed, replication took place ● Like in the other scenario, chosen node started getting requests after 10 minutes
  • 34. Deciding the API Framework ● Developed API endpoints which execute sample queries with Flask, Golang Gin and FastAPI ● Initiated the load tests for three of them separately (10K requests per minute, ran for 20 minutes) ● Monitored the results
  • 35. Deciding the API Framework ● Flask ○ Was not able to handle all the requests so after some time errors started to raise ○ Average response time was 3 seconds and %10 of the incoming requests resulted with errors ● Gin ○ Was able to handle all the requests without errors ○ Average response time was 350 milliseconds and there was no error at all ● FastAPI ○ Was able to handle all the requests without errors as well ○ Average response time was 300 milliseconds without errors