SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
How we scaled Rudder to 10k
nodes
And the road to 50k nodes
Nicolas CHARLES
Co-founder and COO
@nico_charles
2
Scalability ?
Scalability is the capability of a system,
network, or process to handle a growing
amount of work, or its potential to be
enlarged to accommodate that growth
https://en.wikipedia.org/wiki/Scalability
3
Scalability – why is it an issue in Rudder?
What does Rudder do ?
●
Users define policies
●
Apply them on groups of nodes
●
Rudder computes the policies for each
nodes
●
Agents apply them, and send back
information
●
Rudder computes the compliance
4
Scalability – why is it an issue in Rudder?
Each of these points need to go fast
●
Process nodes inventory quickly
●
Have a fast UI
●
Generate policies in a reasonable time
●
Have fast agents, and don’t overflow the
network
●
Compliance of actual state available
5
Rudder Architecture
6
Rudder Architecture
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Applications
Compliance Configuration Inventory
Plugins
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Rudder Engine Techniques
7
The origin of Rudder
●
At first, Rudder was thought for hundred(s) of nodes
●
No real goal for scalability
●
It was, retrospectively, an MVP
8
The origin of Rudder
●
Scalability went up, driven from
●
Users and usages
– Frustration over slowdowns
– More managed servers
●
Features
– Some features needed much improved performance
– Some needed massive architectural change
9
First bottlenecks to tackle
●
Reporting in Rudder
●
Display compliance of nodes
– Change the data model, as everything was Rule Centric in Rudder 2.3
●
Slow display of reports and compliance
– Remember, we are supporting Postgresql 8.x
– Adding relevant indexes
●
Agent side
●
Agent was already used in critical systems, but impacted performance of
nodes
– Rewrite some policies
– Add tooling around agent to prevent clogging
●
Rudder 2.5 was not more scalable, but more consistent
10
Scalability – Step by Step
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Compliance Configuration Inventory
Rudder Engine
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Techniques
Bandwidth & Network
- Flag files to detect new policies
- Relay servers
11
Scalability – Step by Step
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Compliance Configuration Inventory
Rudder Engine
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Techniques
Scale the uses
- Validation workflow
- Synchronisation of Rudder servers
- API
- More Techniques
12
Scalability – Step by Step
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Compliance Configuration Inventory
Rudder Engine
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Techniques
Improve performance
- Save only changes of Inventories
(several order of magnitude faster)
- Change data model for Compliance
(30 % faster compliance)
13
Scalability – 2.9 & 2.10
●
Improving performances is one of the focus
●
Refactoring and code improvements to improve policy generation time
– Use of hashes and caches
●
Fighting with the ORM to have lighter queries
– Much less commits
●
Make impact on network and node adjustable
●
Configure agent run frequency : can configure based on the
performance of nodes and available bandwidth
14
Scalability – 2.9 & 2.10
●
First industrialized performances test – With Tsung
●
Generated inventories automatically, and send them to endpoint
●
Tests with thousands of inventories
●
Thank you @cscmeu !
http://tsung.erlang-projects.org/
15
Scalability – 2.11
●
Goal: manage thousand nodes
●
Distributed setup
– Make Rudder scale by adding more servers for components
●
UI more responsive to user requests
– Async
– LDAP optimizations
●
No more indexes (everything fits in RAM)
●
Much faster policy generation
– Changed of variable lookup, more caching
– Used a bit of parallelism when it wass easy
●
More performance tests
– A big thank to users pushing the limits
16
Scale the uses – Rudder 2.11
●
Technique Editor : everyone can create techniques
●
Uses ncf
●
Graphical User Interface to make Techniques easier to write
17
Rudder 3
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Compliance Configuration Inventory
Rudder Engine
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Techniques
Complete change of UI
- Design and layout
Compliance is everywhere
- Everything is async
- Everything is cached
18
Rudder 3
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Compliance Configuration Inventory
Rudder Engine
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Techniques
New data model : Node Centric
- Compliance is per node
- Cached
- And lazyly computed
19
Rudder 3
Rudder Server Root
Interfaces
CLI
WEB UI
API
Uses
Compliance Configuration Inventory
Rudder Engine
Node
Rudder Agent
Node
Rudder relay
Node
Rudder Agent
Techniques
Lightweight reports
- Change only reporting
- Send reports only for changes
And much less disk usage
20
Rudder 3
●
For this release, devs had between 1000 and 2000 nodes
on their dev systems
●
A lot of timing info embedded in Rudder
●
Permitted to identify low hanging fruits
●
As a result, everything was much faster
●
500ms compute time with 2000 nodes was considered slow, and
reported as a bug
21
Rudder 3.1 – 5000 nodes
●
Rudder 3.1 – reaching the 5000 nodes limit (well – 7500 at
the end of its life)
●
This is the land of micro-optimization, pushing the limits of the model
– Lazy variables to prevent computation of unwanted values
●
Micro tuning of techniques to make policy generation faster
– But we are still talking about 45 minutes for 5000 nodes with policy
validation
●
Massive performance upgrade of the agent
– Change complexity of managing big policy
22
Rudder 3.1 – 5000 nodes
●
Tooling to generate compliance reports from nodes
●
Load servers, detect issues in compliance computing
●
Extensive use of PgBadger to analyze PostgreSQL logs
– From both tests benchs and production systems
– Finding the slow queries and the limits
●
Thank you @matya_j !!
https://github.com/dalibo/pgbadger
23
Rudder 4: going beyond
24
Rudder 4.0: massive changes
●
Policies
●
Each policy is identified by an id
●
Change database model
– Use Doobie, an excellent ORM that lets you write proper SQL
– Configuration is stored in JSON rather than JOINs
●
No « leaking » of policies changes from one node to another
– Regenerate only for the nodes that have been changed
●
Policy generation is much faster
– About 30 times faster (without policy validation)
25
Rudder 4.0: massive changes
●
Compliance
●
Compliance is computed when reports are received server side, cached,
– Twice as fast display of compliance with 1000 nodes, order of magnitude
faster with 5000 nodes
●
Audit mode
●
New LDAP backend (lmdb based)
26
Rudder 4.1: the road to 10k
●
UI is much faster
●
Everything ressources are cached
●
Compress everything (big impact on bad network with large installs and distant
server)
●
Policy generation is pretty fast (if we don’t validate them)
●
About 3 minutes for 7000 nodes
●
External data sources
●
We can trigger from changes remote tool
●
Hooks on events
●
Allow to fine tune behaviour of node acceptation/deletion/policy generation
●
Thank you @FlorianHeigl1 !
27
Rudder 4.3: 10k
●
Policy engine has been rewritten
●
Pluggable, less mutable, a bit faster
●
We can manage 10k nodes on one Rudder server
●
Recommended configuration is 11GB for the Web Interface for 10k nodes
●
Adding more RAM/CPU/IO is enough to go to 15k nodes
●
Still not perfect
●
Policy generation is long with 10k and policy validation activated
●
UI will be sluggish – because of DOM computations
– Might be ok with Firefox 59
●
API will be ok
28
What’s next ?
●
Improve tooling suite
●
Working with Florian Heigl to automate a super large
test plateform
– Automatically create nodes, rules, reports
– At high rate
– Checks application response rate and loads
●
Find new bottleneck using sysdig
29
What’s next ?
●
Improve tooling suite
●
Improve usability and documentation of load tools
– So that more users/contributors can use them
●
Automated tests of UI and measure the response time
at each commit
30
The road to 50k nodes
●
Several types of bottleneck
●
Policy validation
– We can’t realistically validate on the server 50 000 policies
– Policy validation on client side via 2 steps policy updates
●
GUI
– Paginate results on the server side
●
Ease client side burden
●
Improve response rate (especially over slow networks)
– Switch from Angular to ELM
31
The road to 50k nodes
●
Several types of bottleneck
●
Network
– Current protocol is not fit to update hundreds of thousands of files
– Reports are sent back from nodes to Rudder server via syslog
●
Missing compression
●
Rsyslog-psql does one insert/commit in database per received logs :(
●
Policy generation
– Upgrade or replace StringTemplate to lessen IO
– More static files
●
Database
– Use PostgreSQL 10 partitioning to speed up compliance and archiving
32
The road to 50k nodes
●
Missing features
●
We can expect every users of a given installation to need to
manage the whole 50k nodes
– Fine grained authorization (OrBAC)
– Multi-tenancy
– Federation/Synchronisation of different Rudder servers
●
A lot of thinking need to be put in there
●
Improve collaboration
– Notifications everywhere!
– Warn if another user is modifying the current object
●
Change management
– Canary testing
– Ramp-up deployment
33
Final words
●
We are very lucky to have great users pushing the limits
●
A special thank to all of you
Dennis, Olivier, Florian, Christophe, Janos, Pierre, Stéphane, Marc, Alexander,
David, Fabrice, Daniel, Dmitry, Ferenc, François, Vincent, Jean, Lionel, Maxime,
Michael, Enrico, Ilan, Jean Marie, Jeremy, …
(and I’m terribly sorry for all those that I did not mentionned)
●
Tools, softwares and resources evolved during Rudder life
●
They helped improve the scalability as well
How we scaled Rudder to 10k
nodes
Questions?
Nicolas CHARLES
Co-founder and COO
@nico_charles

Contenu connexe

Tendances

Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNTech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNnvirters
 
DEVNET-1175 OpenDaylight Service Function Chaining
DEVNET-1175	OpenDaylight Service Function ChainingDEVNET-1175	OpenDaylight Service Function Chaining
DEVNET-1175 OpenDaylight Service Function ChainingCisco DevNet
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...Cloud Native Day Tel Aviv
 
Deployment topologies for high availability (ha)
Deployment topologies for high availability (ha)Deployment topologies for high availability (ha)
Deployment topologies for high availability (ha)Deepak Mane
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...
【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...
【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...シスコシステムズ合同会社
 
OPNFV Service Function Chaining
OPNFV Service Function ChainingOPNFV Service Function Chaining
OPNFV Service Function ChainingOPNFV
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecPeter Bakas
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancingconfluent
 
LISP and NSH in Open vSwitch
LISP and NSH in Open vSwitchLISP and NSH in Open vSwitch
LISP and NSH in Open vSwitchmestery
 
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)Asher Feldman
 
OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...
OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...
OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...Cloud Native Day Tel Aviv
 
Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...
Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...
Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...Daniel Gheorghita
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 

Tendances (20)

Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNTech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
 
Kafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backboneKafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backbone
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
SDN Project PPT
SDN Project PPTSDN Project PPT
SDN Project PPT
 
DEVNET-1175 OpenDaylight Service Function Chaining
DEVNET-1175	OpenDaylight Service Function ChainingDEVNET-1175	OpenDaylight Service Function Chaining
DEVNET-1175 OpenDaylight Service Function Chaining
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
Barak Perlman, ConteXtream - SFC (Service Function Chaining) Using Openstack ...
 
Deployment topologies for high availability (ha)
Deployment topologies for high availability (ha)Deployment topologies for high availability (ha)
Deployment topologies for high availability (ha)
 
Chapter9ccna
Chapter9ccnaChapter9ccna
Chapter9ccna
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...
【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...
【EPN Seminar Nov.10.2015】 Services Function Chaining Architecture, Standardiz...
 
OPNFV Service Function Chaining
OPNFV Service Function ChainingOPNFV Service Function Chaining
OPNFV Service Function Chaining
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
 
LISP and NSH in Open vSwitch
LISP and NSH in Open vSwitchLISP and NSH in Open vSwitch
LISP and NSH in Open vSwitch
 
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
Open Connect Firmware Delivery With Spinnaker (Spinnaker Summit 2018)
 
OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...
OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...
OpenStack & OVS: From Love-Hate Relationship to Match Made in Heaven - Erez C...
 
Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...
Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...
Design and Implementation of a Load Balancing Algorithm for a Clustered SDN C...
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 

Similaire à How we scaled Rudder to 10k, and the road to 50k

RedisConf18 - Application of Redis in IOT Edge Devices
RedisConf18 - Application of Redis in IOT Edge DevicesRedisConf18 - Application of Redis in IOT Edge Devices
RedisConf18 - Application of Redis in IOT Edge DevicesRedis Labs
 
Introduction to SDN
Introduction to SDNIntroduction to SDN
Introduction to SDNNetCraftsmen
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
 
Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)London Microservices
 
Presentation oracle net services
Presentation    oracle net servicesPresentation    oracle net services
Presentation oracle net servicesxKinAnx
 
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudRow #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudAPNIC
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesAlexander Penev
 
High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive MicroservicesRick Hightower
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureSoftware Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureTapio Rautonen
 
PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
 PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering EdgePROIDEA
 
OpenKilda: Stream Processing Meets Openflow
OpenKilda: Stream Processing Meets OpenflowOpenKilda: Stream Processing Meets Openflow
OpenKilda: Stream Processing Meets OpenflowAPNIC
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedis Labs
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentOpenStack Foundation
 
MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021Ieva Navickaite
 
Kinesis @ lyft
Kinesis @ lyftKinesis @ lyft
Kinesis @ lyftMian Hamid
 
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layerC. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layerUni Systems S.M.S.A.
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Jimmy Angelakos
 

Similaire à How we scaled Rudder to 10k, and the road to 50k (20)

RedisConf18 - Application of Redis in IOT Edge Devices
RedisConf18 - Application of Redis in IOT Edge DevicesRedisConf18 - Application of Redis in IOT Edge Devices
RedisConf18 - Application of Redis in IOT Edge Devices
 
Introduction to SDN
Introduction to SDNIntroduction to SDN
Introduction to SDN
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)Log and control all service-to-service traffic in one place (Kelvin Wong)
Log and control all service-to-service traffic in one place (Kelvin Wong)
 
Presentation oracle net services
Presentation    oracle net servicesPresentation    oracle net services
Presentation oracle net services
 
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudRow #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
Software Defined Networking
Software Defined NetworkingSoftware Defined Networking
Software Defined Networking
 
High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive Microservices
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureSoftware Architecture for Cloud Infrastructure
Software Architecture for Cloud Infrastructure
 
IBM Programmable Network Controller
IBM Programmable Network ControllerIBM Programmable Network Controller
IBM Programmable Network Controller
 
PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
 PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
 
OpenKilda: Stream Processing Meets Openflow
OpenKilda: Stream Processing Meets OpenflowOpenKilda: Stream Processing Meets Openflow
OpenKilda: Stream Processing Meets Openflow
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting Environment
 
MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021
 
Kinesis @ lyft
Kinesis @ lyftKinesis @ lyft
Kinesis @ lyft
 
Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017
 
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layerC. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
C. Sotiriou, Vodafone Greece: Adopting Quarkus for the digital experience layer
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
 

Plus de RUDDER

What if configuration management didn't need to be lvl60 in dev?
What if configuration management didn't need to be lvl60 in dev?What if configuration management didn't need to be lvl60 in dev?
What if configuration management didn't need to be lvl60 in dev?RUDDER
 
Servers compliance: audit, remediation, proof
Servers compliance: audit, remediation, proofServers compliance: audit, remediation, proof
Servers compliance: audit, remediation, proofRUDDER
 
OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?
OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?
OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?RUDDER
 
OW2Con - Configurations, do you prove yours?
OW2Con - Configurations, do you prove yours?OW2Con - Configurations, do you prove yours?
OW2Con - Configurations, do you prove yours?RUDDER
 
The new plugin ecosystem in RUDDER 5.0
The new plugin ecosystem in RUDDER 5.0The new plugin ecosystem in RUDDER 5.0
The new plugin ecosystem in RUDDER 5.0RUDDER
 
What uses for observing operations of Configuration Management?
What uses for observing operations of Configuration Management?What uses for observing operations of Configuration Management?
What uses for observing operations of Configuration Management?RUDDER
 
UX challenges of a UI-centric config management tool
UX challenges of a UI-centric config management toolUX challenges of a UI-centric config management tool
UX challenges of a UI-centric config management toolRUDDER
 
What happened in RUDDER in 2018 and what’s next?
What happened in RUDDER in 2018 and what’s next?What happened in RUDDER in 2018 and what’s next?
What happened in RUDDER in 2018 and what’s next?RUDDER
 
What is RUDDER and when should I use it?
What is RUDDER and when should I use it?What is RUDDER and when should I use it?
What is RUDDER and when should I use it?RUDDER
 
Fosdem - Configurations do you prove yours?
Fosdem - Configurations  do you prove yours?Fosdem - Configurations  do you prove yours?
Fosdem - Configurations do you prove yours?RUDDER
 
L'audit en continu : clé de la conformité démontrable (#POSS 2018)
L'audit en continu : clé de la conformité démontrable (#POSS 2018)L'audit en continu : clé de la conformité démontrable (#POSS 2018)
L'audit en continu : clé de la conformité démontrable (#POSS 2018)RUDDER
 
Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)
Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)
Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)RUDDER
 
Stay up - voyage d'un éditeur de logiciels libres
Stay up - voyage d'un éditeur de logiciels libresStay up - voyage d'un éditeur de logiciels libres
Stay up - voyage d'un éditeur de logiciels libresRUDDER
 
What's new and what's next in Rudder
What's new and what's next in RudderWhat's new and what's next in Rudder
What's new and what's next in RudderRUDDER
 
Poss 2017 : gestion des configurations et mise en conformité chez un service ...
Poss 2017 : gestion des configurations et mise en conformité chez un service ...Poss 2017 : gestion des configurations et mise en conformité chez un service ...
Poss 2017 : gestion des configurations et mise en conformité chez un service ...RUDDER
 
Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...
Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...
Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...RUDDER
 
POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...
POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...
POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...RUDDER
 
DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...
DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...
DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...RUDDER
 
RUDDER - Continuous Configuration (configuration management + continuous aud...
 RUDDER - Continuous Configuration (configuration management + continuous aud... RUDDER - Continuous Configuration (configuration management + continuous aud...
RUDDER - Continuous Configuration (configuration management + continuous aud...RUDDER
 
RUDDER - Continuous Configuration (configuration management + continuous audi...
RUDDER - Continuous Configuration (configuration management + continuous audi...RUDDER - Continuous Configuration (configuration management + continuous audi...
RUDDER - Continuous Configuration (configuration management + continuous audi...RUDDER
 

Plus de RUDDER (20)

What if configuration management didn't need to be lvl60 in dev?
What if configuration management didn't need to be lvl60 in dev?What if configuration management didn't need to be lvl60 in dev?
What if configuration management didn't need to be lvl60 in dev?
 
Servers compliance: audit, remediation, proof
Servers compliance: audit, remediation, proofServers compliance: audit, remediation, proof
Servers compliance: audit, remediation, proof
 
OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?
OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?
OSIS 2019 - Qu’apporte l’observabilité à la gestion de configuration ?
 
OW2Con - Configurations, do you prove yours?
OW2Con - Configurations, do you prove yours?OW2Con - Configurations, do you prove yours?
OW2Con - Configurations, do you prove yours?
 
The new plugin ecosystem in RUDDER 5.0
The new plugin ecosystem in RUDDER 5.0The new plugin ecosystem in RUDDER 5.0
The new plugin ecosystem in RUDDER 5.0
 
What uses for observing operations of Configuration Management?
What uses for observing operations of Configuration Management?What uses for observing operations of Configuration Management?
What uses for observing operations of Configuration Management?
 
UX challenges of a UI-centric config management tool
UX challenges of a UI-centric config management toolUX challenges of a UI-centric config management tool
UX challenges of a UI-centric config management tool
 
What happened in RUDDER in 2018 and what’s next?
What happened in RUDDER in 2018 and what’s next?What happened in RUDDER in 2018 and what’s next?
What happened in RUDDER in 2018 and what’s next?
 
What is RUDDER and when should I use it?
What is RUDDER and when should I use it?What is RUDDER and when should I use it?
What is RUDDER and when should I use it?
 
Fosdem - Configurations do you prove yours?
Fosdem - Configurations  do you prove yours?Fosdem - Configurations  do you prove yours?
Fosdem - Configurations do you prove yours?
 
L'audit en continu : clé de la conformité démontrable (#POSS 2018)
L'audit en continu : clé de la conformité démontrable (#POSS 2018)L'audit en continu : clé de la conformité démontrable (#POSS 2018)
L'audit en continu : clé de la conformité démontrable (#POSS 2018)
 
Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)
Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)
Fiabilité et conformité continues en production avec Rudder (#BBOOST 2018)
 
Stay up - voyage d'un éditeur de logiciels libres
Stay up - voyage d'un éditeur de logiciels libresStay up - voyage d'un éditeur de logiciels libres
Stay up - voyage d'un éditeur de logiciels libres
 
What's new and what's next in Rudder
What's new and what's next in RudderWhat's new and what's next in Rudder
What's new and what's next in Rudder
 
Poss 2017 : gestion des configurations et mise en conformité chez un service ...
Poss 2017 : gestion des configurations et mise en conformité chez un service ...Poss 2017 : gestion des configurations et mise en conformité chez un service ...
Poss 2017 : gestion des configurations et mise en conformité chez un service ...
 
Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...
Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...
Poss 2017 - la continuité, arme secrète de la gestion du si - cas concret de ...
 
POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...
POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...
POSS 2017 : Comment automatiser son infrastructure quand... on a pas le temps...
 
DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...
DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...
DevOps D-Day 2017 - Gestion des configurations et mise en conformité chez un ...
 
RUDDER - Continuous Configuration (configuration management + continuous aud...
 RUDDER - Continuous Configuration (configuration management + continuous aud... RUDDER - Continuous Configuration (configuration management + continuous aud...
RUDDER - Continuous Configuration (configuration management + continuous aud...
 
RUDDER - Continuous Configuration (configuration management + continuous audi...
RUDDER - Continuous Configuration (configuration management + continuous audi...RUDDER - Continuous Configuration (configuration management + continuous audi...
RUDDER - Continuous Configuration (configuration management + continuous audi...
 

Dernier

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

How we scaled Rudder to 10k, and the road to 50k

  • 1. How we scaled Rudder to 10k nodes And the road to 50k nodes Nicolas CHARLES Co-founder and COO @nico_charles
  • 2. 2 Scalability ? Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth https://en.wikipedia.org/wiki/Scalability
  • 3. 3 Scalability – why is it an issue in Rudder? What does Rudder do ? ● Users define policies ● Apply them on groups of nodes ● Rudder computes the policies for each nodes ● Agents apply them, and send back information ● Rudder computes the compliance
  • 4. 4 Scalability – why is it an issue in Rudder? Each of these points need to go fast ● Process nodes inventory quickly ● Have a fast UI ● Generate policies in a reasonable time ● Have fast agents, and don’t overflow the network ● Compliance of actual state available
  • 6. 6 Rudder Architecture Rudder Server Root Interfaces CLI WEB UI API Uses Applications Compliance Configuration Inventory Plugins Node Rudder Agent Node Rudder relay Node Rudder Agent Rudder Engine Techniques
  • 7. 7 The origin of Rudder ● At first, Rudder was thought for hundred(s) of nodes ● No real goal for scalability ● It was, retrospectively, an MVP
  • 8. 8 The origin of Rudder ● Scalability went up, driven from ● Users and usages – Frustration over slowdowns – More managed servers ● Features – Some features needed much improved performance – Some needed massive architectural change
  • 9. 9 First bottlenecks to tackle ● Reporting in Rudder ● Display compliance of nodes – Change the data model, as everything was Rule Centric in Rudder 2.3 ● Slow display of reports and compliance – Remember, we are supporting Postgresql 8.x – Adding relevant indexes ● Agent side ● Agent was already used in critical systems, but impacted performance of nodes – Rewrite some policies – Add tooling around agent to prevent clogging ● Rudder 2.5 was not more scalable, but more consistent
  • 10. 10 Scalability – Step by Step Rudder Server Root Interfaces CLI WEB UI API Uses Compliance Configuration Inventory Rudder Engine Node Rudder Agent Node Rudder relay Node Rudder Agent Techniques Bandwidth & Network - Flag files to detect new policies - Relay servers
  • 11. 11 Scalability – Step by Step Rudder Server Root Interfaces CLI WEB UI API Uses Compliance Configuration Inventory Rudder Engine Node Rudder Agent Node Rudder relay Node Rudder Agent Techniques Scale the uses - Validation workflow - Synchronisation of Rudder servers - API - More Techniques
  • 12. 12 Scalability – Step by Step Rudder Server Root Interfaces CLI WEB UI API Uses Compliance Configuration Inventory Rudder Engine Node Rudder Agent Node Rudder relay Node Rudder Agent Techniques Improve performance - Save only changes of Inventories (several order of magnitude faster) - Change data model for Compliance (30 % faster compliance)
  • 13. 13 Scalability – 2.9 & 2.10 ● Improving performances is one of the focus ● Refactoring and code improvements to improve policy generation time – Use of hashes and caches ● Fighting with the ORM to have lighter queries – Much less commits ● Make impact on network and node adjustable ● Configure agent run frequency : can configure based on the performance of nodes and available bandwidth
  • 14. 14 Scalability – 2.9 & 2.10 ● First industrialized performances test – With Tsung ● Generated inventories automatically, and send them to endpoint ● Tests with thousands of inventories ● Thank you @cscmeu ! http://tsung.erlang-projects.org/
  • 15. 15 Scalability – 2.11 ● Goal: manage thousand nodes ● Distributed setup – Make Rudder scale by adding more servers for components ● UI more responsive to user requests – Async – LDAP optimizations ● No more indexes (everything fits in RAM) ● Much faster policy generation – Changed of variable lookup, more caching – Used a bit of parallelism when it wass easy ● More performance tests – A big thank to users pushing the limits
  • 16. 16 Scale the uses – Rudder 2.11 ● Technique Editor : everyone can create techniques ● Uses ncf ● Graphical User Interface to make Techniques easier to write
  • 17. 17 Rudder 3 Rudder Server Root Interfaces CLI WEB UI API Uses Compliance Configuration Inventory Rudder Engine Node Rudder Agent Node Rudder relay Node Rudder Agent Techniques Complete change of UI - Design and layout Compliance is everywhere - Everything is async - Everything is cached
  • 18. 18 Rudder 3 Rudder Server Root Interfaces CLI WEB UI API Uses Compliance Configuration Inventory Rudder Engine Node Rudder Agent Node Rudder relay Node Rudder Agent Techniques New data model : Node Centric - Compliance is per node - Cached - And lazyly computed
  • 19. 19 Rudder 3 Rudder Server Root Interfaces CLI WEB UI API Uses Compliance Configuration Inventory Rudder Engine Node Rudder Agent Node Rudder relay Node Rudder Agent Techniques Lightweight reports - Change only reporting - Send reports only for changes And much less disk usage
  • 20. 20 Rudder 3 ● For this release, devs had between 1000 and 2000 nodes on their dev systems ● A lot of timing info embedded in Rudder ● Permitted to identify low hanging fruits ● As a result, everything was much faster ● 500ms compute time with 2000 nodes was considered slow, and reported as a bug
  • 21. 21 Rudder 3.1 – 5000 nodes ● Rudder 3.1 – reaching the 5000 nodes limit (well – 7500 at the end of its life) ● This is the land of micro-optimization, pushing the limits of the model – Lazy variables to prevent computation of unwanted values ● Micro tuning of techniques to make policy generation faster – But we are still talking about 45 minutes for 5000 nodes with policy validation ● Massive performance upgrade of the agent – Change complexity of managing big policy
  • 22. 22 Rudder 3.1 – 5000 nodes ● Tooling to generate compliance reports from nodes ● Load servers, detect issues in compliance computing ● Extensive use of PgBadger to analyze PostgreSQL logs – From both tests benchs and production systems – Finding the slow queries and the limits ● Thank you @matya_j !! https://github.com/dalibo/pgbadger
  • 24. 24 Rudder 4.0: massive changes ● Policies ● Each policy is identified by an id ● Change database model – Use Doobie, an excellent ORM that lets you write proper SQL – Configuration is stored in JSON rather than JOINs ● No « leaking » of policies changes from one node to another – Regenerate only for the nodes that have been changed ● Policy generation is much faster – About 30 times faster (without policy validation)
  • 25. 25 Rudder 4.0: massive changes ● Compliance ● Compliance is computed when reports are received server side, cached, – Twice as fast display of compliance with 1000 nodes, order of magnitude faster with 5000 nodes ● Audit mode ● New LDAP backend (lmdb based)
  • 26. 26 Rudder 4.1: the road to 10k ● UI is much faster ● Everything ressources are cached ● Compress everything (big impact on bad network with large installs and distant server) ● Policy generation is pretty fast (if we don’t validate them) ● About 3 minutes for 7000 nodes ● External data sources ● We can trigger from changes remote tool ● Hooks on events ● Allow to fine tune behaviour of node acceptation/deletion/policy generation ● Thank you @FlorianHeigl1 !
  • 27. 27 Rudder 4.3: 10k ● Policy engine has been rewritten ● Pluggable, less mutable, a bit faster ● We can manage 10k nodes on one Rudder server ● Recommended configuration is 11GB for the Web Interface for 10k nodes ● Adding more RAM/CPU/IO is enough to go to 15k nodes ● Still not perfect ● Policy generation is long with 10k and policy validation activated ● UI will be sluggish – because of DOM computations – Might be ok with Firefox 59 ● API will be ok
  • 28. 28 What’s next ? ● Improve tooling suite ● Working with Florian Heigl to automate a super large test plateform – Automatically create nodes, rules, reports – At high rate – Checks application response rate and loads ● Find new bottleneck using sysdig
  • 29. 29 What’s next ? ● Improve tooling suite ● Improve usability and documentation of load tools – So that more users/contributors can use them ● Automated tests of UI and measure the response time at each commit
  • 30. 30 The road to 50k nodes ● Several types of bottleneck ● Policy validation – We can’t realistically validate on the server 50 000 policies – Policy validation on client side via 2 steps policy updates ● GUI – Paginate results on the server side ● Ease client side burden ● Improve response rate (especially over slow networks) – Switch from Angular to ELM
  • 31. 31 The road to 50k nodes ● Several types of bottleneck ● Network – Current protocol is not fit to update hundreds of thousands of files – Reports are sent back from nodes to Rudder server via syslog ● Missing compression ● Rsyslog-psql does one insert/commit in database per received logs :( ● Policy generation – Upgrade or replace StringTemplate to lessen IO – More static files ● Database – Use PostgreSQL 10 partitioning to speed up compliance and archiving
  • 32. 32 The road to 50k nodes ● Missing features ● We can expect every users of a given installation to need to manage the whole 50k nodes – Fine grained authorization (OrBAC) – Multi-tenancy – Federation/Synchronisation of different Rudder servers ● A lot of thinking need to be put in there ● Improve collaboration – Notifications everywhere! – Warn if another user is modifying the current object ● Change management – Canary testing – Ramp-up deployment
  • 33. 33 Final words ● We are very lucky to have great users pushing the limits ● A special thank to all of you Dennis, Olivier, Florian, Christophe, Janos, Pierre, Stéphane, Marc, Alexander, David, Fabrice, Daniel, Dmitry, Ferenc, François, Vincent, Jean, Lionel, Maxime, Michael, Enrico, Ilan, Jean Marie, Jeremy, … (and I’m terribly sorry for all those that I did not mentionned) ● Tools, softwares and resources evolved during Rudder life ● They helped improve the scalability as well
  • 34. How we scaled Rudder to 10k nodes Questions? Nicolas CHARLES Co-founder and COO @nico_charles