Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Technology

IBM has been innovating to create new products for its clients and the world for over a century. Customers look to IBM Power Systems to address their hybrid multicloud infrastructure needs. Larger POWER9 servers can have up to 192 CPU cores, 64 TB of memory, dozens of PB of SAN storage and typically run a mixture of AIX (UNIX) and Enterprise Linux (RHEL or SLES) workloads. As part of its sales process, IBM is always benchmarking its new hardware and software which clients use to monitor their systems.

Discover how IBM and its clients are using InfluxDB and Grafana to collect, store and visualize performance data, which is used to monitor and tune for peak performance in ever-changing workload environments. Join this webinar featuring Nigel Griffiths from IBM to learn how you can use InfluxDB and Grafana to improve large production workloads.

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Technology

  1. 1. 19/11/2020 How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Technology Nigel Griffiths Advanced Technology, IBM, UK - These are my personal opinions - IBM email: nag@uk.ibm.com Open Source: nigelargriffiths@hotmail.com @mr_nmon twitter http://tinyurl.com/njmon - njmon sourceforge project http://tinyurl.com/AIXpert - My 135 Blog https://www.youtube.com/user/nigelargriffiths - 215 Grafana LabsInfluxdata 300,000++ people are IBMers Benchmark Centres, Demonstrations, Services people, Cloud Offerings Very roughly • 1/3rd Software • 1/3rd Services • (technical + business) • 1/3rd Hardware (Systems) • (servers + storage) One chart on 1 2
  2. 2. 19/11/2020 1/3rd Hardware (Systems) • (servers + storage) • POWER (IBM chip POWER9) • OS: Linux, AIX (UNIX), IBM i • 192 CPU cores, 1536 HW threads • 64 TB memory, 64 adapters • Z (mainframe, IBM chip z15) • OS: z/OS, LinuxONE for Linux • Storage • FlashSystem, SAN, NVMe, . . . Second chart on POWER9 Servers Enterprise E950 E980 S922 S924 Scale-Out Midrange 2U or 4U 1 or 2 socket SMT=8 4 to 24 CPU cores 4 TB RAM 4U 4 socket SMT=8 16 to 48 CPU cores 16 TB RAM 7U to 22U 16 socket SMT=8 192 CPU cores (1536 programs running at the same time) 64 TB RAM 3 4
  3. 3. 19/11/2020 My claim to fame? Started 25 years ago nmon  Nigel’s Monitor OS performance data On screen or CSV file Various graphing tool For AIX and Linux (any HW) nmon for AIX now part of AIX nmon for Linux open source 1,040,108+ downloads (today) My claim to fame? Started 25 years ago nmon  Nigel’s Monitor OS performance data On screen or CSV file Various graphing tool For AIX and Linux (any HW) nmon for AIX now part of AIX nmon for Linux open source 1,040,108+ downloads (today) Things have changed since starting nmon - CPUs x 200,000 faster - RAM x 1 million larger - Network x 10,000 rate - Disks, SSD & NVMe - x 500,000 larger - x 10,000 faster - nmon file format = quirky & !standard 5 6
  4. 4. 19/11/2020 In 2018: What would I do differently? Every possible statistic Standard format [not .csv] Central database [not local files] Live graphs In 2018: What would I do differently? 7 8
  5. 5. 19/11/2020 Every possible statistic DONE Standard format: JSON + LP Central database: InfluxDB Live graphs: Grafana In 2018: What would I do differently? Every possible statistic DONE Standard format: JSON + LP Central database: InfluxDB Live graphs: Grafana JSON  elastic & Splunk LP  telegraf  Prometheus In 2018: What would I do differently? 9 10
  6. 6. 19/11/2020 In 2020: njmon = JSON output to njmond.py central daemon nimon = InfluxDB Line Protocol direct to InfluxDB What to know more? http://nmon.sourceforge.net/njmon In 2020: Improved handling of JSON data Continues as JSON popular useful format, especially Python But added: InfluxDB Line Protocol for direct nimon agent to remote InfluxDB Wow!! Every release is like Xmas  we get new toys (graphs) - Even a webpage with samples Lets talk about Grafana! 11 12
  7. 7. 19/11/2020 1 2 3 1. My logo = cool 2. Donut graph, yum 3. Dark mode: Helps you sleep at the desk! 4. LED graphic equaliser: draws attention to red stats 5. Button single stat and graph: high density 6. Blue Ridge Mountain range graph 7. Carpet graph – see later 4 5 6 Lets talk about Grafana! Open Source from IBMers So AIX benefits from the latest Time-Series database & graph engines from Nigel “Mr nmon” Griffiths Stats: CPU RAM Disks Paging Volume Groups Logical Volumes Networks Adapters Kernel stats Tapes Uptime User count AIO File systems System Calls Processes NFS GPFS Spectrum Scale VIOS virtual disks VIOS SEA VIOS virtual networks VIOS SSP Linux NVIDIA GPUs AIX rPerf Recent updates: - New faster centralized collector - New direct to InfluxDB = nimon - New YouTube videos for Sys Admins - New Grafana graph templates See https://tinyurl.com/njmon Very simple endpoint install InfluxDB and Grafana install in 10 minutes Grafana starter dashboards but prime value is creating any graph you want in seconds JSON output for Elastic (ELK) & Splunk Line Protocol for InfluxDB & Prometheus End-points with njmon 13 14
  8. 8. 19/11/2020 Grafana njmond.py JSON njmon -e Python Client Python Client beats PrometheusTelegraf Direct Direct JSON New New InfluxDB InfluxDB n[ji]mon Time-Series Infrastructure njmon -w JSON JSON JSON Line Protocol Line Protocol Boot Strap + InfluxDB Grafana Both offer a Cloud Service - Pay you bill & they run it - Remote access to save data - Remote access for graphing Both in-house Enterprise - You buy and run on your kit - Get extra features - Get full support Both offer Open Source - Free access to the code - Free downloads pre-compiled - For Linux – AMD64, ARM - Also MacOS and Windows! Both available on POWER8 + POWER9 on Linux (RHEL & SUSE) & AIX via our friends at https://power-devops.com 15 16
  9. 9. 19/11/2020 Boot Strap + InfluxDB Grafana 1 Install is very quick 1 minute download 6 minute install 3 minute setup + firewall + start up Just take the defaults Influx CLI: create database njmon Boot Strap + InfluxDB Grafana VIOS 2.2.6 VIOS 3.1.0 AIX 6.1 AIX 7.1 AIX 7.2 Ubuntu 18/20 SLES 12/15 RHEL 7/8 Each end-point needs an agent - Single small binary + manual pages - “ninstall” script 1 2 Install is very quick 1 minute download 6 minute install 3 minute setup + firewall + start up Just take the defaults Influx CLI: create database njmon 17 18
  10. 10. 19/11/2020 Boot Strap + InfluxDB Grafana VIOS 2.2.6 VIOS 3.1.0 AIX 6.1 AIX 7.1 AIX 7.2 Ubuntu 18/20 SLES 12/15 RHEL 7/8 Each end-point needs an agent - Single small binary + manual pages - “ninstall” script Each end-point: add a crontab entry 0 * * * * /usr/lbin/nimon -c 60 -k -i influx -p 8086 1 2 3 Install is very quick 1 minute download 6 minute install 3 minute setup + firewall + start up Just take the defaults Influx CLI: create database njmon Boot Strap + InfluxDB Grafana VIOS 2.2.6 VIOS 3.1.0 AIX 6.1 AIX 7.1 AIX 7.2 Ubuntu 18/20 SLES 12/15 RHEL 7/8 Each end-point needs an agent - Single small binary + manual pages - “ninstall” script Each end-point: add a crontab entry 0 * * * * /usr/lbin/nimon -c 60 -k -i influx -p 8086 Access Grafana via a browser Settings: add influx/njmon datasource From https://grafana.com/dashboards Import njmon AIX & Linux dashboards Enjoy 1 2 3 4 Install is very quick 1 minute download 6 minute install 3 minute setup + firewall + start up Just take the defaults Influx CLI: create database njmon 19 20
  11. 11. 19/11/2020 Any one heard of the Dolly Parton curve? Any one heard of the Dolly Parton curve? TIME CPUBUSY PMPMAM Lunch AM AfternoonMorning Batch 100% 21 22
  12. 12. 19/11/2020 Any one heard of the Dolly Parton curve? Three Crunch points TIME CPUBUSY PMPMAM Lunch AM AfternoonMorning Batch 100% Any one heard of the Dolly Parton curve? Three Crunch points TIME CPUBUSY PMPMAM Lunch AM AfternoonMorning Batch 100% Problems: Averaging the whole day hides the three crunch points Periodic over a day and over a week (typical busier on Friday) Periodic over a month (end of month extra reporting) and end of year! Batch overrun times 23 24
  13. 13. 19/11/2020 Heat map for whole days using the Grafana Carpet Plugin This is a excellent way to determining the busy day + busy hours = first step for trend forecasting WeekWeekWeek Heat map for whole days using the Grafana Carpet Plugin This is a excellent way to determining the busy day + busy hours = first step for trend forecasting Heat Map Warning: There are always red parts! WeekWeekWeek Interesting Peaks 8 to 10 am & 2 pm Tuesday to Friday Busy day is Thursday 25 26
  14. 14. 19/11/2020 My to do list: Work out how to graph CPU on successive Fridays 8 am to 10 pm Batch overrun can be handled with alerts but still need trending Ideas to nag@uk.ibm.com Could be done in: InfluxDB “flux” or Grafana Alerts Some ideas Fri Fri Fri Fri Friday (1) Remove the weeds (2) One graph with overlay selected time periods (3) 27 28
  15. 15. 19/11/2020 Two recent ideas: 1. Not easy to document measures & statistics names! [Tried to find out how many stats from Linux statd?] 2. Capturing ad-hoc stats on Big Production Servers Answers: AIXpert Blog 29 30
  16. 16. 19/11/2020 Grafana | CPU | Memory | Disks | Network | Kernel | Processes InfluxDB Measure for AIX and Linux Saving other statistics to the same njmon database. If you can get the data via a script, you can send it on with the same njmon tags in 1/100th of a second. Then graph OS stats & your stats at the same time. Measure Statistics RDBMS script: measure* -g rdbms -G commits=986.34,rollbacks=23.1,hitratio=99.3 Sales script: measure* -g sales -G itemsold=32984,avgcost=79.99,profit=-0.003 Users script: measure* -g user -G online=65389,online_mins=184,click_pm=18.2 IT-tasks times script: measure* -g tasks -G dataload=47_min,backupmin=124,batch_min=84 * Also need InfluxDB: hostname + port & Influx-DB-name Pi Returning temp of Zero Pi fell off Network Effect of outside air temperature rising to 32C Raspberry Pi 3 MicroSD card With five temperature probes 31 32
  17. 17. 19/11/2020 njmon Graphing LAB + InfluxDB & Grafana 7.1 - Wednesday Nigel GriffithsTechnical Staff Member nag@uk.ibm.com @mr_nmon & on LinkedIn https://www.youtube.com/nigelargriffiths http://tinyurl.com/AIXpert Cloud VM’s for AIX provided by IBM TechU Oct 2020 | Virtual YouTube:- https://youtu.be/XKs5dKGuFe8 If you want to know more . . . Project Website https://tinyurl.com/njmon AIXpert Blog Articles https://www.ibm.com/support /pages/aixpert-blog-nigel- griffiths-mrnmon YouTube Videos - Details on the Next slide AIX Performance Tuning Lab Monitoring Students 33 34
  18. 18. 19/11/2020 https://www.youtube.com/user/nigelargriffiths https://www.youtube.com/watch?v=wN5GNc9HH7Y&list=PLKQlFnmiWVydb5QdX2wz9iRfJkuuB2ec1 13 videos ~16,000 Views up to Oct 2020 ~3 hours Now a 13 part YouTube playlist njmon + InfluxDB + Grafana for monitoring AIX & Linux Performance data Summary: nmon not going away • On screen or data capture • Stable down stream infrastructure • Very popular & part of AIX njmon for new age online tooling • Lightweight single binary agent coded in C • Loads more stats • Real-time, data stream, flexible • Python’s JSON parser = fast and cool • Or direct to InfluxDB • AIX vast array of perfstat stats + VIOS stats • Linux nmon & njmon code synergy • 100’s of new stats including GPU & GPFS email nigelargriffiths@hotmail.com Questions https://tinyurl.com/njmon https://www.linkedin.com/in/nigel-griffiths 35 36
  19. 19. 19/11/2020 37 38
  20. 20. 19/11/2020 39 40
  21. 21. 19/11/2020 41

×