SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Следете актуалните обяви за DevOps
Партньори:
Monitoring & Logging
Marian Marinov
mm@yuhu.biz
Следете актуалните обяви за DevOps
Партньори:
Who am I?
● Director of Engineering at Web Hosting Canada
● Former partner and Head of DevOps at SiteGround
● A SysAdmin and System Architect
Следете актуалните обяви за DevOps
Партньори:
What I have to monitor?
● 13 physical linux machines
○ Storage capacity (df/df -i)
○ S.M.A.R.T. of the drives
○ RAID (HW or Soft)
○ Network (routes, traffic and usage)
○ Performance (CPU, Mem, I/O, Processes)
○ Kernel logs
○ Service logs
Следете актуалните обяви за DevOps
Партньори:
What I have to monitor?
● 1 UPS
● 2 APC PUDs
● 2 Switches (SNMP statistics)
● 2 Thermostat (traffic, temp, humidity)
● 40+ LXC containers
○ Performance (CPU, Mem, I/O, Processes)
○ Storage capacity (df/df -i)
○ Service logs
● 2-3 Wifi access points
○ number of attached devices
○ traffic per-device
Следете актуалните обяви за DevOps
Партньори:
What I have to monitor?
● A few things for which I want traffic and power on time
○ 3 TVs
○ 3 Amplifiers
○ 4 Cameras
○ 1 Washing machine
○ 1 Dryer
Следете актуалните обяви за DevOps
Партньори:
What I wanted
● Single solution for log and metrics collection
● Single central interface
Следете актуалните обяви за DevOps
Партньори:
What I ended up having
● multiple grafana dashboards
● monitor events, instead of reading logs
● a bunch of different log collectors
Следете актуалните обяви за DevOps
Партньори:
What tested
● syslog-ng
● rsyslog
● Filebeat
● Prometheus node_exporter
● Loki
● Fluentd
● Clolectd
● StatsD
● Graylog
● PostgreSQL+timescale
● Grafana
Следете актуалните обяви за DevOps
Партньори:
Conclusions
● there is no one solution to rule them all
● SNMP is still the king for networking
● too many logging formats and DSLs
Следете актуалните обяви за DevOps
Партньори:
Conclusions
● there is no one solution to rule them all
● SNMP is still the king for networking
● too many logging formats and DSLs
● collectd was the easiest
○ with the most metrics out-of-the-box
Следете актуалните обяви за DevOps
Партньори:
Conclusions
● there is no one solution to rule them all
● SNMP is still the king for networking
● too many logging formats and DSLs
● collectd was the easiest
○ with the most metrics out-of-the-box
● ElasticSearch + Kibana require too much resources
○ Not usable for smaller setups
● Graylog uses a lot of CPU for the work it does
○ alerts can be based on number of events instead of parsing logs
Следете актуалните обяви за DevOps
Партньори:
Installation / Setup
● basic apt-get:
○ rsyslogd, syslog-ng, fluentd, collectd, filebeat, loki, node_exporter
○ statsd wanted full npm
Следете актуалните обяви за DevOps
Партньори:
Pros and Cons
● Syslog pros
○ can easily ingest netconsole kernel logging
○ very good performance
○ well documented and standardized interface
● Syslog cons
○ fire and forget
○ the syslog protocol
○ not enough parsing flexibility
○ syslog-ng was heavier then rsyslogd
Следете актуалните обяви за DevOps
Партньори:
Pros and Cons
● Loki/Node_exporter/filebeat/fluentd
○ very good parsing capabilities
○ filebeat was the easiest for me
○ reliable log delivery
○ different integrations
○ ready made grafana dashboards
● Loki/Node_exporter/filebeat/fluentd
○ very heavy on CPU
○ Loki did not have sysv init script :)
Следете актуалните обяви за DevOps
Партньори:
Interesting
● OAIEvals Collector - by Nikolay Stankov
Следете актуалните обяви за DevOps
Партньори:
DB integrations
1. Prometheus node-exporter
2. Fluentd
3. filebeat
4. syslog
Следете актуалните обяви за DevOps
Партньори:
Not out of the box
● Custom local collectors still have to go directly to your metrics DB
● Having a producer/subscriber greatly reduces the performance hit
● Fluent and fliebeat were the only one supporting kafka out of the box
○ https://github.com/hikhvar/mqtt2prometheus
○ https://github.com/toyokazu/fluent-plugin-mqtt-io
Thank you!
СЛЕДВАЩО СЪБИТИЕ
Лектор Дата Език
Следете актуалните обяви за DevOps
Партньори:
Monitoring & Logging
Marian Marinov 19.Mar.2024 Български
Contacts:
Marian Marinov
Github profile
Facebook profile
Следете актуалните обяви за DevOps
Партньори:
What do I have on the containers?
● NextCloud
● Home Assistant
● Mirrors
● VPNs
● NetBox
● Monitoring (Grafana, StatPing)
● Games (Minecraft, CS, PVPGN)
● IRC (server, bouncers, bots)
● Matrix, Mattermost
● Backups
● Streaming (FOSDEM streamer setup)
● DBs (PostgreSQL, MySQL, Redis, DragonFly, Timescale, InfluxDB, Mongo)
● Vitess, ProxySQL
● MPI (Gearman, MQTT, Kafka, RabbitMQ)
● Web stuff - Wiki, HAproxy, Nginx, Varnish
● OpenShift, OpenStack, K8s on VMs and physical
● A lot of other experiments
Следете актуалните обяви за DevOps
Партньори:
What storage do I use?
● Local + LVM
● DRBD+OCFS2
● iSCSI
● cLVM + iSCSI
● GlusterFS
● OrangeFS
● I had in the past:
○ Ceph
○ NFS
○ cLVM + ATAoE
○ cLVM + NBD

Contenu connexe

Tendances

Patroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionPatroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companion
Alexander Kukushkin
 

Tendances (20)

eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
Scylla Compaction Strategies
Scylla Compaction StrategiesScylla Compaction Strategies
Scylla Compaction Strategies
 
Monitoring microservices with Prometheus
Monitoring microservices with PrometheusMonitoring microservices with Prometheus
Monitoring microservices with Prometheus
 
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
 
CANARY DEPLOYMENT
CANARY DEPLOYMENTCANARY DEPLOYMENT
CANARY DEPLOYMENT
 
Infrastructure as Code with Terraform
Infrastructure as Code with TerraformInfrastructure as Code with Terraform
Infrastructure as Code with Terraform
 
20 tips and tricks with the Autonomous Database
20 tips and tricks with the Autonomous Database20 tips and tricks with the Autonomous Database
20 tips and tricks with the Autonomous Database
 
OrientDB
OrientDBOrientDB
OrientDB
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practices
 
ClickHouse Keeper
ClickHouse KeeperClickHouse Keeper
ClickHouse Keeper
 
Patroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionPatroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companion
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS Update
 
Terraform vs Pulumi
Terraform vs PulumiTerraform vs Pulumi
Terraform vs Pulumi
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
Terraform
TerraformTerraform
Terraform
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 

Similaire à Dev.bg DevOps March 2024 Monitoring & Logging

BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr
Linaro
 
A Kernel of Truth: Intrusion Detection and Attestation with eBPF
A Kernel of Truth: Intrusion Detection and Attestation with eBPFA Kernel of Truth: Intrusion Detection and Attestation with eBPF
A Kernel of Truth: Intrusion Detection and Attestation with eBPF
oholiab
 
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...
DevSecCon
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 

Similaire à Dev.bg DevOps March 2024 Monitoring & Logging (20)

LCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Summit: The future of testing at LinaroLCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Summit: The future of testing at Linaro
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
 
Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
 
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
 
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
 
The bond between automation and network engineering
The bond between automation and network engineeringThe bond between automation and network engineering
The bond between automation and network engineering
 
[scala.by] Launching new application fast
[scala.by] Launching new application fast[scala.by] Launching new application fast
[scala.by] Launching new application fast
 
BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr
 
A Kernel of Truth: Intrusion Detection and Attestation with eBPF
A Kernel of Truth: Intrusion Detection and Attestation with eBPFA Kernel of Truth: Intrusion Detection and Attestation with eBPF
A Kernel of Truth: Intrusion Detection and Attestation with eBPF
 
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...
 
Go at uber
Go at uberGo at uber
Go at uber
 
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
OpenFlow @ Google
OpenFlow @ GoogleOpenFlow @ Google
OpenFlow @ Google
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 

Plus de Marian Marinov

Plus de Marian Marinov (20)

How to implement PassKeys in your application
How to implement PassKeys in your applicationHow to implement PassKeys in your application
How to implement PassKeys in your application
 
Basic presentation of cryptography mechanisms
Basic presentation of cryptography mechanismsBasic presentation of cryptography mechanisms
Basic presentation of cryptography mechanisms
 
Microservices: Benefits, drawbacks and are they for me?
Microservices: Benefits, drawbacks and are they for me?Microservices: Benefits, drawbacks and are they for me?
Microservices: Benefits, drawbacks and are they for me?
 
Introduction and replication to DragonflyDB
Introduction and replication to DragonflyDBIntroduction and replication to DragonflyDB
Introduction and replication to DragonflyDB
 
Message Queuing - Gearman, Mosquitto, Kafka and RabbitMQ
Message Queuing - Gearman, Mosquitto, Kafka and RabbitMQMessage Queuing - Gearman, Mosquitto, Kafka and RabbitMQ
Message Queuing - Gearman, Mosquitto, Kafka and RabbitMQ
 
How to successfully migrate to DevOps .pdf
How to successfully migrate to DevOps .pdfHow to successfully migrate to DevOps .pdf
How to successfully migrate to DevOps .pdf
 
How to survive in the work from home era
How to survive in the work from home eraHow to survive in the work from home era
How to survive in the work from home era
 
Managing sysadmins
Managing sysadminsManaging sysadmins
Managing sysadmins
 
Improve your storage with bcachefs
Improve your storage with bcachefsImprove your storage with bcachefs
Improve your storage with bcachefs
 
Control your service resources with systemd
 Control your service resources with systemd  Control your service resources with systemd
Control your service resources with systemd
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storage
 
Защо и как да обогатяваме знанията си?
Защо и как да обогатяваме знанията си?Защо и как да обогатяваме знанията си?
Защо и как да обогатяваме знанията си?
 
Securing your MySQL server
Securing your MySQL serverSecuring your MySQL server
Securing your MySQL server
 
Sysadmin vs. dev ops
Sysadmin vs. dev opsSysadmin vs. dev ops
Sysadmin vs. dev ops
 
DoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDKDoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDK
 
Challenges with high density networks
Challenges with high density networksChallenges with high density networks
Challenges with high density networks
 
SiteGround building automation
SiteGround building automationSiteGround building automation
SiteGround building automation
 
Preventing cpu side channel attacks with kernel tracking
Preventing cpu side channel attacks with kernel trackingPreventing cpu side channel attacks with kernel tracking
Preventing cpu side channel attacks with kernel tracking
 
Managing a lot of servers
Managing a lot of serversManaging a lot of servers
Managing a lot of servers
 
Let's Encrypt failures
Let's Encrypt failuresLet's Encrypt failures
Let's Encrypt failures
 

Dernier

01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
AshwaniAnuragi1
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 

Dernier (20)

Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Students
 
Introduction-to- Metrology and Quality.pptx
Introduction-to- Metrology and Quality.pptxIntroduction-to- Metrology and Quality.pptx
Introduction-to- Metrology and Quality.pptx
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth Reinforcement
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Students
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Fundamentals of Internet of Things (IoT) Part-2
Fundamentals of Internet of Things (IoT) Part-2Fundamentals of Internet of Things (IoT) Part-2
Fundamentals of Internet of Things (IoT) Part-2
 
Databricks Generative AI Fundamentals .pdf
Databricks Generative AI Fundamentals  .pdfDatabricks Generative AI Fundamentals  .pdf
Databricks Generative AI Fundamentals .pdf
 
Study of Computer Hardware System using Block Diagram
Study of Computer Hardware System using Block DiagramStudy of Computer Hardware System using Block Diagram
Study of Computer Hardware System using Block Diagram
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
 

Dev.bg DevOps March 2024 Monitoring & Logging

  • 1. Следете актуалните обяви за DevOps Партньори: Monitoring & Logging Marian Marinov mm@yuhu.biz
  • 2. Следете актуалните обяви за DevOps Партньори: Who am I? ● Director of Engineering at Web Hosting Canada ● Former partner and Head of DevOps at SiteGround ● A SysAdmin and System Architect
  • 3. Следете актуалните обяви за DevOps Партньори: What I have to monitor? ● 13 physical linux machines ○ Storage capacity (df/df -i) ○ S.M.A.R.T. of the drives ○ RAID (HW or Soft) ○ Network (routes, traffic and usage) ○ Performance (CPU, Mem, I/O, Processes) ○ Kernel logs ○ Service logs
  • 4. Следете актуалните обяви за DevOps Партньори: What I have to monitor? ● 1 UPS ● 2 APC PUDs ● 2 Switches (SNMP statistics) ● 2 Thermostat (traffic, temp, humidity) ● 40+ LXC containers ○ Performance (CPU, Mem, I/O, Processes) ○ Storage capacity (df/df -i) ○ Service logs ● 2-3 Wifi access points ○ number of attached devices ○ traffic per-device
  • 5. Следете актуалните обяви за DevOps Партньори: What I have to monitor? ● A few things for which I want traffic and power on time ○ 3 TVs ○ 3 Amplifiers ○ 4 Cameras ○ 1 Washing machine ○ 1 Dryer
  • 6. Следете актуалните обяви за DevOps Партньори: What I wanted ● Single solution for log and metrics collection ● Single central interface
  • 7. Следете актуалните обяви за DevOps Партньори: What I ended up having ● multiple grafana dashboards ● monitor events, instead of reading logs ● a bunch of different log collectors
  • 8. Следете актуалните обяви за DevOps Партньори: What tested ● syslog-ng ● rsyslog ● Filebeat ● Prometheus node_exporter ● Loki ● Fluentd ● Clolectd ● StatsD ● Graylog ● PostgreSQL+timescale ● Grafana
  • 9. Следете актуалните обяви за DevOps Партньори: Conclusions ● there is no one solution to rule them all ● SNMP is still the king for networking ● too many logging formats and DSLs
  • 10. Следете актуалните обяви за DevOps Партньори: Conclusions ● there is no one solution to rule them all ● SNMP is still the king for networking ● too many logging formats and DSLs ● collectd was the easiest ○ with the most metrics out-of-the-box
  • 11. Следете актуалните обяви за DevOps Партньори: Conclusions ● there is no one solution to rule them all ● SNMP is still the king for networking ● too many logging formats and DSLs ● collectd was the easiest ○ with the most metrics out-of-the-box ● ElasticSearch + Kibana require too much resources ○ Not usable for smaller setups ● Graylog uses a lot of CPU for the work it does ○ alerts can be based on number of events instead of parsing logs
  • 12. Следете актуалните обяви за DevOps Партньори: Installation / Setup ● basic apt-get: ○ rsyslogd, syslog-ng, fluentd, collectd, filebeat, loki, node_exporter ○ statsd wanted full npm
  • 13. Следете актуалните обяви за DevOps Партньори: Pros and Cons ● Syslog pros ○ can easily ingest netconsole kernel logging ○ very good performance ○ well documented and standardized interface ● Syslog cons ○ fire and forget ○ the syslog protocol ○ not enough parsing flexibility ○ syslog-ng was heavier then rsyslogd
  • 14. Следете актуалните обяви за DevOps Партньори: Pros and Cons ● Loki/Node_exporter/filebeat/fluentd ○ very good parsing capabilities ○ filebeat was the easiest for me ○ reliable log delivery ○ different integrations ○ ready made grafana dashboards ● Loki/Node_exporter/filebeat/fluentd ○ very heavy on CPU ○ Loki did not have sysv init script :)
  • 15. Следете актуалните обяви за DevOps Партньори: Interesting ● OAIEvals Collector - by Nikolay Stankov
  • 16. Следете актуалните обяви за DevOps Партньори: DB integrations 1. Prometheus node-exporter 2. Fluentd 3. filebeat 4. syslog
  • 17. Следете актуалните обяви за DevOps Партньори: Not out of the box ● Custom local collectors still have to go directly to your metrics DB ● Having a producer/subscriber greatly reduces the performance hit ● Fluent and fliebeat were the only one supporting kafka out of the box ○ https://github.com/hikhvar/mqtt2prometheus ○ https://github.com/toyokazu/fluent-plugin-mqtt-io
  • 18. Thank you! СЛЕДВАЩО СЪБИТИЕ Лектор Дата Език Следете актуалните обяви за DevOps Партньори: Monitoring & Logging Marian Marinov 19.Mar.2024 Български Contacts: Marian Marinov Github profile Facebook profile
  • 19. Следете актуалните обяви за DevOps Партньори: What do I have on the containers? ● NextCloud ● Home Assistant ● Mirrors ● VPNs ● NetBox ● Monitoring (Grafana, StatPing) ● Games (Minecraft, CS, PVPGN) ● IRC (server, bouncers, bots) ● Matrix, Mattermost ● Backups ● Streaming (FOSDEM streamer setup) ● DBs (PostgreSQL, MySQL, Redis, DragonFly, Timescale, InfluxDB, Mongo) ● Vitess, ProxySQL ● MPI (Gearman, MQTT, Kafka, RabbitMQ) ● Web stuff - Wiki, HAproxy, Nginx, Varnish ● OpenShift, OpenStack, K8s on VMs and physical ● A lot of other experiments
  • 20. Следете актуалните обяви за DevOps Партньори: What storage do I use? ● Local + LVM ● DRBD+OCFS2 ● iSCSI ● cLVM + iSCSI ● GlusterFS ● OrangeFS ● I had in the past: ○ Ceph ○ NFS ○ cLVM + ATAoE ○ cLVM + NBD