IBM has been innovating to create new products for its clients and the world for over a century. Customers look to IBM Power Systems to address their hybrid multicloud infrastructure needs. Larger POWER9 servers can have up to 192 CPU cores, 64 TB of memory, dozens of PB of SAN storage and typically run a mixture of AIX (UNIX) and Enterprise Linux (RHEL or SLES) workloads. As part of its sales process, IBM is always benchmarking its new hardware and software which clients use to monitor their systems.
Discover how IBM and its clients are using InfluxDB and Grafana to collect, store and visualize performance data, which is used to monitor and tune for peak performance in ever-changing workload environments. Join this webinar featuring Nigel Griffiths from IBM to learn how you can use InfluxDB and Grafana to improve large production workloads.
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Technology
1. 19/11/2020
How IBM's Massive
POWER9 UNIX
Servers Benefit from
InfluxDB and Grafana
Technology
Nigel Griffiths Advanced Technology, IBM, UK
- These are my personal opinions -
IBM email: nag@uk.ibm.com
Open Source: nigelargriffiths@hotmail.com
@mr_nmon twitter
http://tinyurl.com/njmon - njmon sourceforge project
http://tinyurl.com/AIXpert - My 135 Blog
https://www.youtube.com/user/nigelargriffiths - 215
Grafana LabsInfluxdata
300,000++ people are IBMers
Benchmark Centres, Demonstrations, Services people, Cloud Offerings
Very roughly
• 1/3rd Software
• 1/3rd Services
• (technical + business)
• 1/3rd Hardware (Systems)
• (servers + storage)
One chart on
1
2
2. 19/11/2020
1/3rd Hardware (Systems)
• (servers + storage)
• POWER (IBM chip POWER9)
• OS: Linux, AIX (UNIX), IBM i
• 192 CPU cores, 1536 HW threads
• 64 TB memory, 64 adapters
• Z (mainframe, IBM chip z15)
• OS: z/OS, LinuxONE for Linux
• Storage
• FlashSystem, SAN, NVMe, . . .
Second chart on
POWER9 Servers Enterprise
E950
E980
S922
S924
Scale-Out
Midrange
2U or 4U
1 or 2 socket
SMT=8
4 to 24 CPU cores
4 TB RAM
4U
4 socket
SMT=8
16 to 48 CPU cores
16 TB RAM
7U to 22U
16 socket
SMT=8
192 CPU cores
(1536 programs running
at the same time)
64 TB RAM
3
4
3. 19/11/2020
My claim to fame?
Started 25 years ago
nmon Nigel’s Monitor
OS performance data
On screen or CSV file
Various graphing tool
For AIX and Linux (any HW)
nmon for AIX now part of AIX
nmon for Linux open source
1,040,108+ downloads (today)
My claim to fame?
Started 25 years ago
nmon Nigel’s Monitor
OS performance data
On screen or CSV file
Various graphing tool
For AIX and Linux (any HW)
nmon for AIX now part of AIX
nmon for Linux open source
1,040,108+ downloads (today)
Things have changed
since starting nmon
- CPUs x 200,000 faster
- RAM x 1 million larger
- Network x 10,000 rate
- Disks, SSD & NVMe
- x 500,000 larger
- x 10,000 faster
- nmon file format
= quirky & !standard
5
6
4. 19/11/2020
In 2018:
What would I do differently?
Every possible statistic
Standard format [not .csv]
Central database [not local files]
Live graphs
In 2018:
What would I do differently?
7
8
5. 19/11/2020
Every possible statistic DONE
Standard format: JSON + LP
Central database: InfluxDB
Live graphs: Grafana
In 2018:
What would I do differently?
Every possible statistic DONE
Standard format: JSON + LP
Central database: InfluxDB
Live graphs: Grafana
JSON elastic & Splunk
LP telegraf Prometheus
In 2018:
What would I do differently?
9
10
6. 19/11/2020
In 2020:
njmon = JSON output to
njmond.py central daemon
nimon = InfluxDB Line Protocol
direct to InfluxDB
What to know more?
http://nmon.sourceforge.net/njmon
In 2020:
Improved handling of JSON data
Continues as JSON popular
useful format, especially Python
But added:
InfluxDB Line Protocol for direct
nimon agent to remote InfluxDB
Wow!!
Every release is like Xmas
we get new toys (graphs)
- Even a webpage with samples
Lets talk about
Grafana!
11
12
7. 19/11/2020
1
2
3
1. My logo = cool
2. Donut graph, yum
3. Dark mode: Helps you sleep at the desk!
4. LED graphic equaliser: draws attention to red stats
5. Button single stat and graph: high density
6. Blue Ridge Mountain range graph
7. Carpet graph – see later
4
5
6
Lets talk about
Grafana!
Open Source from IBMers
So AIX benefits from the latest Time-Series database
& graph engines from Nigel “Mr nmon” Griffiths
Stats:
CPU
RAM
Disks
Paging
Volume Groups
Logical Volumes
Networks
Adapters
Kernel stats
Tapes
Uptime
User count
AIO
File systems
System Calls
Processes
NFS
GPFS Spectrum Scale
VIOS virtual disks
VIOS SEA
VIOS virtual networks
VIOS SSP
Linux NVIDIA GPUs
AIX rPerf
Recent updates:
- New faster centralized collector
- New direct to InfluxDB = nimon
- New YouTube videos for Sys Admins
- New Grafana graph templates
See https://tinyurl.com/njmon
Very simple endpoint install
InfluxDB and Grafana install in 10 minutes
Grafana starter dashboards but prime value
is creating any graph you want in seconds
JSON output for Elastic (ELK) & Splunk
Line Protocol for InfluxDB & Prometheus
End-points
with njmon
13
14
9. 19/11/2020
Boot Strap
+
InfluxDB Grafana
1
Install is very quick
1 minute download
6 minute install
3 minute setup + firewall + start up
Just take the defaults
Influx CLI: create database njmon
Boot Strap
+
InfluxDB Grafana
VIOS 2.2.6
VIOS 3.1.0
AIX 6.1
AIX 7.1
AIX 7.2
Ubuntu 18/20
SLES 12/15
RHEL 7/8
Each end-point needs an agent
- Single small binary + manual pages
- “ninstall” script
1
2
Install is very quick
1 minute download
6 minute install
3 minute setup + firewall + start up
Just take the defaults
Influx CLI: create database njmon
17
18
10. 19/11/2020
Boot Strap
+
InfluxDB Grafana
VIOS 2.2.6
VIOS 3.1.0
AIX 6.1
AIX 7.1
AIX 7.2
Ubuntu 18/20
SLES 12/15
RHEL 7/8
Each end-point needs an agent
- Single small binary + manual pages
- “ninstall” script
Each end-point: add a crontab entry
0 * * * * /usr/lbin/nimon -c 60 -k -i influx -p 8086
1
2
3
Install is very quick
1 minute download
6 minute install
3 minute setup + firewall + start up
Just take the defaults
Influx CLI: create database njmon
Boot Strap
+
InfluxDB Grafana
VIOS 2.2.6
VIOS 3.1.0
AIX 6.1
AIX 7.1
AIX 7.2
Ubuntu 18/20
SLES 12/15
RHEL 7/8
Each end-point needs an agent
- Single small binary + manual pages
- “ninstall” script
Each end-point: add a crontab entry
0 * * * * /usr/lbin/nimon -c 60 -k -i influx -p 8086
Access Grafana via a browser
Settings: add influx/njmon datasource
From https://grafana.com/dashboards
Import njmon AIX & Linux dashboards
Enjoy
1
2
3
4
Install is very quick
1 minute download
6 minute install
3 minute setup + firewall + start up
Just take the defaults
Influx CLI: create database njmon
19
20
11. 19/11/2020
Any one heard of the
Dolly Parton curve?
Any one heard of the
Dolly Parton curve?
TIME
CPUBUSY
PMPMAM
Lunch
AM
AfternoonMorning Batch
100%
21
22
12. 19/11/2020
Any one heard of the
Dolly Parton curve?
Three Crunch points
TIME
CPUBUSY
PMPMAM
Lunch
AM
AfternoonMorning Batch
100%
Any one heard of the
Dolly Parton curve?
Three Crunch points
TIME
CPUBUSY
PMPMAM
Lunch
AM
AfternoonMorning Batch
100%
Problems:
Averaging the whole day hides the three crunch points
Periodic over a day and over a week (typical busier on Friday)
Periodic over a month (end of month extra reporting) and end of year!
Batch overrun times
23
24
13. 19/11/2020
Heat map for whole days using the Grafana Carpet Plugin
This is a excellent way to determining the busy day + busy hours = first step for trend forecasting
WeekWeekWeek
Heat map for whole days using the Grafana Carpet Plugin
This is a excellent way to determining the busy day + busy hours = first step for trend forecasting
Heat Map Warning: There are always red parts!
WeekWeekWeek
Interesting Peaks 8 to 10
am & 2 pm
Tuesday to Friday
Busy day is Thursday
25
26
14. 19/11/2020
My to do list:
Work out how to graph CPU on
successive Fridays 8 am to 10 pm
Batch overrun can be handled
with alerts but still need trending
Ideas to nag@uk.ibm.com
Could be done in:
InfluxDB “flux” or
Grafana Alerts
Some ideas
Fri Fri Fri Fri Friday
(1) Remove the weeds
(2) One graph with overlay
selected time periods
(3)
27
28
15. 19/11/2020
Two recent ideas:
1. Not easy to document
measures & statistics names!
[Tried to find out how many stats from Linux statd?]
2. Capturing ad-hoc stats on Big
Production Servers
Answers: AIXpert Blog
29
30
16. 19/11/2020
Grafana
| CPU
| Memory
| Disks
| Network
| Kernel
| Processes
InfluxDB
Measure for AIX and Linux
Saving other statistics to the same njmon database.
If you can get the data via a script, you can send it
on with the same njmon tags in 1/100th of a second.
Then graph OS stats & your stats at the same time.
Measure Statistics
RDBMS script:
measure* -g rdbms -G commits=986.34,rollbacks=23.1,hitratio=99.3
Sales script:
measure* -g sales -G itemsold=32984,avgcost=79.99,profit=-0.003
Users script:
measure* -g user -G online=65389,online_mins=184,click_pm=18.2
IT-tasks times script:
measure* -g tasks -G dataload=47_min,backupmin=124,batch_min=84
* Also need InfluxDB: hostname + port & Influx-DB-name
Pi Returning temp of Zero
Pi fell off Network
Effect of outside air
temperature rising to 32C
Raspberry Pi 3
MicroSD card
With five
temperature
probes
31
32
17. 19/11/2020
njmon Graphing LAB
+ InfluxDB & Grafana 7.1
- Wednesday
Nigel GriffithsTechnical Staff Member
nag@uk.ibm.com
@mr_nmon & on LinkedIn
https://www.youtube.com/nigelargriffiths
http://tinyurl.com/AIXpert
Cloud VM’s for AIX provided by
IBM TechU
Oct 2020 | Virtual
YouTube:- https://youtu.be/XKs5dKGuFe8
If you want to
know more . . .
Project Website
https://tinyurl.com/njmon
AIXpert Blog Articles
https://www.ibm.com/support
/pages/aixpert-blog-nigel-
griffiths-mrnmon
YouTube Videos
- Details on the Next slide
AIX Performance
Tuning Lab
Monitoring Students
33
34
18. 19/11/2020
https://www.youtube.com/user/nigelargriffiths
https://www.youtube.com/watch?v=wN5GNc9HH7Y&list=PLKQlFnmiWVydb5QdX2wz9iRfJkuuB2ec1
13 videos
~16,000 Views
up to Oct 2020
~3 hours
Now a 13 part YouTube playlist
njmon + InfluxDB + Grafana
for monitoring AIX & Linux
Performance data
Summary:
nmon not going away
• On screen or data capture
• Stable down stream infrastructure
• Very popular & part of AIX
njmon for new age online tooling
• Lightweight single binary agent coded in C
• Loads more stats
• Real-time, data stream, flexible
• Python’s JSON parser = fast and cool
• Or direct to InfluxDB
• AIX vast array of perfstat stats + VIOS stats
• Linux nmon & njmon code synergy
• 100’s of new stats including GPU & GPFS
email nigelargriffiths@hotmail.com
Questions
https://tinyurl.com/njmon
https://www.linkedin.com/in/nigel-griffiths
35
36