SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Building a Real Time Analytics API
at Scale
DataXDay, May 17th 2018
Sylvain Friquet
@sylvainfriquet
Software Engineer
Algolia: Search as a Service
As-you-type Speed
Results in milliseconds
at every keystrokes.
Relevance
Finding the best content
for every intent.
User Experience
Delightfully engaging,
impressively intuitive.
@DataXDay
Algolia: Search as a Service
@DataXDay
Algolia by the numbers
16
Regions
55
Data centers
Offices
Regions
40B
Searches /mo
150B
API calls /mo
2012
Founded
200
Employees
4500
Customers
$74M
Funding
Algolia Search Analytics
@DataXDay@DataXDay
Where we started
> 4 years old project
> ElasticSearch
> Self Hosted
> 500M to 40B searches/month
> Upgrading ES Cluster too tedious
@DataXDay
What we wanted
> Sub second API response time
> Low latency
> Large retention
> Billions of events per day
> Scale with us
> Hosted solution
@DataXDay
The big picture
@DataXDay
Datastore we considered
> BigQuery
> RedShift
> Citus
Google BigQuery
@DataXDay
Citus
> Postgres Extension
> Distributed
> Multi Tenant
> Near real time analytics
> Scale Out
@DataXDay
Sub-second analytics query
> Ingesting raw events
> Rolling them up
> Sub-second API queries
@DataXDay
Rollup
> Aggregation
> Coarse grain analysis
> Pre determined queries
@DataXDay
Ingesting raw events
> Simple schema CREATE TABLE queries (
app_id text,
timestamp timestamp,
query text,
user_id text,
created_at timestamp default now()
);
@DataXDay
Ingesting raw events
> Simple schema
> Shard by tenant
CREATE TABLE queries (
app_id text,
timestamp timestamp,
query text,
user_id text,
created_at timestamp default now()
);
SELECT
create_distributed_table('queries',
'app_id');
@DataXDay
Ingesting raw events
> Simple schema
> Shard by tenant
> Batch insert (parallel COPY)
> Up to 7M rows/s
CREATE TABLE queries (
app_id text,
timestamp timestamp,
query text,
user_id text,
created_at timestamp default now()
);
SELECT
create_distributed_table('queries',
'app_id');
@DataXDay
Rollup table
CREATE TABLE rollups_5min (
timestamp timestamp,
app_id text,
query_count bigint,
user_count HLL,
top_queries JSONB
);
> Aggregate metrics per time window
@DataXDay
Rollup table
CREATE TABLE rollups_5min (
timestamp timestamp,
app_id text,
query_count bigint,
user_count HLL,
top_queries JSONB
);
> Aggregate metrics per time window
> TOPN and HLL extension
@DataXDay
Rollup table
CREATE TABLE rollups_5min (
timestamp timestamp,
app_id text,
query_count bigint,
user_count HLL,
top_queries JSONB
);
SELECT create_distributed_table('rollups_5min',
'app_id');
> Aggregate metrics per time window
> TOPN and HLL extension
> Collocated with raw event tables
@DataXDay
Rollup query
> Periodic rollup
INSERT INTO rollups_5min
SELECT
date_trunc('seconds', …) AS minute,
app_id,
count(*) AS query_count,
hll_add_agg(hll_hash_bigint(user_id)) AS user_count,
topn_add_agg(query) AS top_queries
FROM queries
WHERE created_at >= $1 AND created_at <= $2
GROUP BY app_id, minute
@DataXDay
Rollup query
INSERT INTO rollups_5min
SELECT
date_trunc('seconds', …) AS minute,
app_id,
count(*) AS query_count,
hll_add_agg(hll_hash_bigint(user_id)) AS user_count,
topn_add_agg(query) AS top_queries
FROM queries
WHERE created_at >= $1 AND created_at <= $2
GROUP BY app_id, minute
> Periodic rollup
> Concurrently executed across workers
@DataXDay
Rollup query
INSERT INTO rollups_5min
SELECT
date_trunc('seconds', …) AS minute,
app_id,
count(*) AS query_count,
hll_add_agg(hll_hash_bigint(user_id)) AS user_count,
topn_add_agg(query) AS top_queries
FROM queries
WHERE created_at >= $1 AND created_at <= $2
GROUP BY app_id, minute
> Periodic rollup
> Concurrently executed across workers
> Out of order events
@DataXDay
Rollup query
INSERT INTO rollups_5min
SELECT
date_trunc('seconds', …) AS minute,
app_id,
count(*) AS query_count,
hll_add_agg(hll_hash_bigint(user_id)) AS user_count,
topn_add_agg(query) AS top_queries
FROM queries
WHERE created_at >= $1 AND created_at <= $2
GROUP BY app_id, minute
ON CONFLICT (app_id, minute)
DO UPDATE SET
query_count = query + EXCLUDED.query_count,
user_count = user_count + EXCLUDED.user_count,
top_queries = top_queries + EXCLUDED.top_queries;
> Periodic rollup
> Concurrently executed across workers
> Out of order events
> Incremental or idempotent
@DataXDay
Rolling up the rollups
> Further aggregate 5min rollups into 1day rollups
> 200x to 50,000x compression ratio in our case
@DataXDay
API queries
Sample queries
> Count
SELECT sum(query_count) FROM … rollups_1day UNION ALL rollups_5min … WHERE …
> Distinct Approx Count
SELECT hll_cardinality(sum(user_count))::bigint FROM ...
> TopN
SELECT (topn(topn_union_agg(top_queries), 10)).* FROM ...
@DataXDay
Some numbers
> every 5min: ~20s to rollup 20M rows
> 64 shards, 48 vCPUs, 366G RAM
> API latency p99 < 800ms, p95 < 500ms
@DataXDay
Conclusion
> Rollup approach working at scale
> Citus becoming the foundation for several new
products (Click Analytics...)
@DataXDay
Thank you
The video of this presentation
will be soon available at dataxday.fr
Thanks to our sponsors
Stay tuned by following @DataXDay

Contenu connexe

Similaire à DataXDay - Building a Real Time Analytics API at Scale

Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data project
Michael Peacock
 

Similaire à DataXDay - Building a Real Time Analytics API at Scale (20)

Utilizing Microsoft Graph API and Office 365 Management Activity API during s...
Utilizing Microsoft Graph API and Office 365 Management Activity API during s...Utilizing Microsoft Graph API and Office 365 Management Activity API during s...
Utilizing Microsoft Graph API and Office 365 Management Activity API during s...
 
Building Streaming Applications with Streaming SQL
Building Streaming Applications with Streaming SQLBuilding Streaming Applications with Streaming SQL
Building Streaming Applications with Streaming SQL
 
Get up and running with google app engine in 60 minutes or less
Get up and running with google app engine in 60 minutes or lessGet up and running with google app engine in 60 minutes or less
Get up and running with google app engine in 60 minutes or less
 
Bootstrapping an App for Launch
Bootstrapping an App for LaunchBootstrapping an App for Launch
Bootstrapping an App for Launch
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Google for モバイル アプリ 16:00: モバイル kpi 分析の新標準 fluentd + google big query
Google for モバイル アプリ   16:00: モバイル kpi 分析の新標準 fluentd + google big queryGoogle for モバイル アプリ   16:00: モバイル kpi 分析の新標準 fluentd + google big query
Google for モバイル アプリ 16:00: モバイル kpi 分析の新標準 fluentd + google big query
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015
 
Supporting Enterprise System Rollouts with Splunk
Supporting Enterprise System Rollouts with SplunkSupporting Enterprise System Rollouts with Splunk
Supporting Enterprise System Rollouts with Splunk
 
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
 
Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...
 
[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps
 
GIDS13 - Building Service for Any Clients
GIDS13 - Building Service for Any ClientsGIDS13 - Building Service for Any Clients
GIDS13 - Building Service for Any Clients
 
[@IndeedEng] Large scale interactive analytics with Imhotep
[@IndeedEng] Large scale interactive analytics with Imhotep[@IndeedEng] Large scale interactive analytics with Imhotep
[@IndeedEng] Large scale interactive analytics with Imhotep
 
Patterns for Building Streaming Apps
Patterns for Building Streaming AppsPatterns for Building Streaming Apps
Patterns for Building Streaming Apps
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
 
Adventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at TwitterAdventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at Twitter
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data project
 
GA Konferenz-2011 Nick Mihailovski_API
GA Konferenz-2011 Nick Mihailovski_APIGA Konferenz-2011 Nick Mihailovski_API
GA Konferenz-2011 Nick Mihailovski_API
 
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York CityThe Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
 

Plus de DataXDay Conference by Xebia

Plus de DataXDay Conference by Xebia (6)

DataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leaders
 
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
 
DataXDay - A data scientist journey to industrialization of machine learning
DataXDay - A data scientist journey to industrialization of machine learning DataXDay - A data scientist journey to industrialization of machine learning
DataXDay - A data scientist journey to industrialization of machine learning
 
DataXDay - Real-Time Access log analysis
DataXDay - Real-Time Access log analysis DataXDay - Real-Time Access log analysis
DataXDay - Real-Time Access log analysis
 
DataXDay - Tensors in the sky with CloudML
DataXDay - Tensors in the sky with CloudML DataXDay - Tensors in the sky with CloudML
DataXDay - Tensors in the sky with CloudML
 
DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

DataXDay - Building a Real Time Analytics API at Scale

  • 1. Building a Real Time Analytics API at Scale DataXDay, May 17th 2018 Sylvain Friquet @sylvainfriquet Software Engineer
  • 2. Algolia: Search as a Service As-you-type Speed Results in milliseconds at every keystrokes. Relevance Finding the best content for every intent. User Experience Delightfully engaging, impressively intuitive. @DataXDay
  • 3. Algolia: Search as a Service @DataXDay
  • 4. Algolia by the numbers 16 Regions 55 Data centers Offices Regions 40B Searches /mo 150B API calls /mo 2012 Founded 200 Employees 4500 Customers $74M Funding
  • 6. Where we started > 4 years old project > ElasticSearch > Self Hosted > 500M to 40B searches/month > Upgrading ES Cluster too tedious @DataXDay
  • 7. What we wanted > Sub second API response time > Low latency > Large retention > Billions of events per day > Scale with us > Hosted solution @DataXDay
  • 9. Datastore we considered > BigQuery > RedShift > Citus Google BigQuery @DataXDay
  • 10. Citus > Postgres Extension > Distributed > Multi Tenant > Near real time analytics > Scale Out @DataXDay
  • 11. Sub-second analytics query > Ingesting raw events > Rolling them up > Sub-second API queries @DataXDay
  • 12. Rollup > Aggregation > Coarse grain analysis > Pre determined queries @DataXDay
  • 13. Ingesting raw events > Simple schema CREATE TABLE queries ( app_id text, timestamp timestamp, query text, user_id text, created_at timestamp default now() ); @DataXDay
  • 14. Ingesting raw events > Simple schema > Shard by tenant CREATE TABLE queries ( app_id text, timestamp timestamp, query text, user_id text, created_at timestamp default now() ); SELECT create_distributed_table('queries', 'app_id'); @DataXDay
  • 15. Ingesting raw events > Simple schema > Shard by tenant > Batch insert (parallel COPY) > Up to 7M rows/s CREATE TABLE queries ( app_id text, timestamp timestamp, query text, user_id text, created_at timestamp default now() ); SELECT create_distributed_table('queries', 'app_id'); @DataXDay
  • 16. Rollup table CREATE TABLE rollups_5min ( timestamp timestamp, app_id text, query_count bigint, user_count HLL, top_queries JSONB ); > Aggregate metrics per time window @DataXDay
  • 17. Rollup table CREATE TABLE rollups_5min ( timestamp timestamp, app_id text, query_count bigint, user_count HLL, top_queries JSONB ); > Aggregate metrics per time window > TOPN and HLL extension @DataXDay
  • 18. Rollup table CREATE TABLE rollups_5min ( timestamp timestamp, app_id text, query_count bigint, user_count HLL, top_queries JSONB ); SELECT create_distributed_table('rollups_5min', 'app_id'); > Aggregate metrics per time window > TOPN and HLL extension > Collocated with raw event tables @DataXDay
  • 19. Rollup query > Periodic rollup INSERT INTO rollups_5min SELECT date_trunc('seconds', …) AS minute, app_id, count(*) AS query_count, hll_add_agg(hll_hash_bigint(user_id)) AS user_count, topn_add_agg(query) AS top_queries FROM queries WHERE created_at >= $1 AND created_at <= $2 GROUP BY app_id, minute @DataXDay
  • 20. Rollup query INSERT INTO rollups_5min SELECT date_trunc('seconds', …) AS minute, app_id, count(*) AS query_count, hll_add_agg(hll_hash_bigint(user_id)) AS user_count, topn_add_agg(query) AS top_queries FROM queries WHERE created_at >= $1 AND created_at <= $2 GROUP BY app_id, minute > Periodic rollup > Concurrently executed across workers @DataXDay
  • 21. Rollup query INSERT INTO rollups_5min SELECT date_trunc('seconds', …) AS minute, app_id, count(*) AS query_count, hll_add_agg(hll_hash_bigint(user_id)) AS user_count, topn_add_agg(query) AS top_queries FROM queries WHERE created_at >= $1 AND created_at <= $2 GROUP BY app_id, minute > Periodic rollup > Concurrently executed across workers > Out of order events @DataXDay
  • 22. Rollup query INSERT INTO rollups_5min SELECT date_trunc('seconds', …) AS minute, app_id, count(*) AS query_count, hll_add_agg(hll_hash_bigint(user_id)) AS user_count, topn_add_agg(query) AS top_queries FROM queries WHERE created_at >= $1 AND created_at <= $2 GROUP BY app_id, minute ON CONFLICT (app_id, minute) DO UPDATE SET query_count = query + EXCLUDED.query_count, user_count = user_count + EXCLUDED.user_count, top_queries = top_queries + EXCLUDED.top_queries; > Periodic rollup > Concurrently executed across workers > Out of order events > Incremental or idempotent @DataXDay
  • 23. Rolling up the rollups > Further aggregate 5min rollups into 1day rollups > 200x to 50,000x compression ratio in our case @DataXDay
  • 24. API queries Sample queries > Count SELECT sum(query_count) FROM … rollups_1day UNION ALL rollups_5min … WHERE … > Distinct Approx Count SELECT hll_cardinality(sum(user_count))::bigint FROM ... > TopN SELECT (topn(topn_union_agg(top_queries), 10)).* FROM ... @DataXDay
  • 25. Some numbers > every 5min: ~20s to rollup 20M rows > 64 shards, 48 vCPUs, 366G RAM > API latency p99 < 800ms, p95 < 500ms @DataXDay
  • 26. Conclusion > Rollup approach working at scale > Citus becoming the foundation for several new products (Click Analytics...) @DataXDay
  • 28. The video of this presentation will be soon available at dataxday.fr Thanks to our sponsors Stay tuned by following @DataXDay