Contenu connexe
Similaire à Linux on System z Optimizing Resource Utilization for Linux under z/VM – Part II (20)
Plus de IBM India Smarter Computing (20)
Linux on System z Optimizing Resource Utilization for Linux under z/VM – Part II
- 1. 2012-03-15
Linux on System z
Optimizing Resource Utilization for Linux
under z/VM – Part II
Dr. Juergen Doelle
IBM Germany Research & Development
visit us at http://www.ibm.com/developerworks/linux/linux390/perf/index.html
© 2012 IBM Corporation
- 2. Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names
might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the
Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Other product and service names might be trademarks of IBM or other companies.
2 © 2012 IBM Corporation
- 3. Agenda
Introduction
Methods to pause a z/VM guest
WebSphere Application Server JVM stacking
Guest scaling and DCSS usage
Summary
3 © 2012 IBM Corporation
- 4. Introduction
A virtualized environment
– runs operating systems like Linux on virtual hardware
– backing up the virtual hardware with physical hardware as required is the responsibility of a Hypervisor, like
z/VM.
– provides significant advantages in regard of management and resource utilization
– running virtualized on shared resources introduces a new dynamic into the system
The important part is the 'as required'
– normally do not all guests need all assigned resources all the time
– It must be possible for the Hypervisor to identify resources which are not used.
This presentation analyzes two issues
– Idling applications which look so frequently for work that they appear active to the Hypervisor
• Concerns many applications running with multiple processes/threads
• Sample: 'noisy' WebSphere Application Server
• Solution: pausing guests which are know to be unused for a longer period
– A configuration question, what is better vertical or horizontal application stacking
• stacking many applications/middle ware in one very large guest
• having many guests with one guest for each application/middle ware
• or something in between
• Sample: 200 WebSphere JVMs
• Solution: Analyze the behavior of setup variations
4 © 2012 IBM Corporation
- 5. Agenda
Introduction
Methods to pause a z/VM guest
WebSphere Application Server JVM stacking
Guest scaling and DCSS usage
Summary
5 © 2012 IBM Corporation
- 6. Introduction
The requirement:
– An idling guest should not consume CPU or memory.
– Definition of idle: A guest not processing any client requests is idle
• This is the customer view, not the view from the operating system or Hypervisor.
The issue:
– An 'idling' WebSphere Application Server guest is so frequently looking for work, that it does not become
idle from the Hypervisor view.
– For z/VM: it can not set the guest state to dormant and the resources, especially the memory, are
considered as actively used:
• z/VM could use real memory pages from dormant guests easily for other active guests
• But the memory pages from idling WebSphere guests will stay in real memory and not be paged out
• The memory pages from an idling WebSphere Application server compete with really active guests for real memory
pages
– This behavior expected to be not specific for WebSphere Application Server
A solution
– Guests which are known as inactive, for example when the developer has finished his work, are made
inactive to the z/VM by
• using Linux suspend mechanism (hibernates the Linux)
• using z/VM stop command (stops the virtual CPUs)
– This should help to increase the level of memory overcommitment significantly, for example for systems
hosting WAS environments for a world wide working development groups
6 © 2012 IBM Corporation
- 7. How to pause the guest
Linux Suspend/Resume
– Setup: dedicated swap disk to hold the full guest memory
zipl.conf: resume=<swap device_node>
Optional: add a boot target with the noresume parameter as failsafe entry
– Suspend: echo disk >/sys/power/state
– Impact: controlled halt of Linux and its devices
– Resume: just IPL the guest
– more details in the device drivers book:
http://www.ibm.com/developerworks/linux/linux390/documentation_dev.html
z/VM CP Stop/Begin
– Setup: Privilege class: G
– Stop: CP stop cpu all (might be issued from vmcp)
– Impact: stops virtual CPUs, execution just halted,
device states might remain undefined
– Resume: CP begin cpu all (from x3270 session, disconnect when done)
– more details in CP command reference:
http://publib.boulder.ibm.com/infocenter/zvm/v6r1/index.jsp?topic=/com.ibm.zvm.v610.hcpb7/toc.htm
7 © 2012 IBM Corporation
- 8. Objectives - Part 1
Objectives – Part 1
– Determine the time needed to deactivate/activate the guest
– Show that the deactivated z/VM guest state becomes dormant
– Use a WebSphere Application Server guest and a standalone database
8 © 2012 IBM Corporation
- 9. Time to pause or restart the guests
Times until the guest is halted
Linux: suspend z/VM: stop command
standalone WebSphere standalone WebSphere
database guest database guest
times [sec] 8 27 immediately immediately
Times until the guest is started again
z/VM: begin
Linux: resume
command
times [sec] 19 immediately
CP stop/begin is much faster
– In case of an IPL/ power loss system state is lost
– Disk and memory state of the system might be inconsistent
Linux suspend writes the memory image to the swap device and shutdown the devices.
– Needs more time, but the system is left in a very controlled state
– IPL or power loss have no impact on the system state
– All devices are cleanly shutdown
9 © 2012 IBM Corporation
- 10. Guest States
Suspend/Resume
Queue State of Suspended Guests
DayTrader Workload (WAS/DB2)
state: 1 = DISP, 0 = DORM
1
suspend resume
0
00:00 07:12 14:24 21:36 28:48 36:00
time in mm:ss
LNX00105 LNX00107
Stop/Begin
Queue State of Suspended Guests
DayTrader Workload (WAS/DB2)
state: 1 = DISP, 0 = DORM
1
stop begin
0
00:00 07:12 14:24 21:36 28:48 36:00
time in mm:ss
Suspend/Resume behaves ideal, full time dormant!
With Stop/Begin the guest gets always scheduled again
– on reason is that z/VM is still processing interrupts for the virtual NICs (VSWITCH)
10 © 2012 IBM Corporation
- 11. Objectives - Part 2
Objectives – Part 2
– Show if the pages from the paused guests are really moved to XSTOR
Environment
– use 5 guests with WebSphere Application Server and DB2 and 5 standalone database guests (from another
vendor)
– The application server systems are in a cluster environment which require an additional guest with network
deployment manager
– 4 Systems of interest: target systems
• 2 WebSphere + 2 standalone databases
– 4 Standby systems: activated to produce memory pressure when systems of interest are paused
• 2 WebSphere + 2 standalone databases
– 2 Base load systems: never paused to produce a constant base load
• 1 WebSphere + 1 standalone database
11 © 2012 IBM Corporation
- 12. Suspend Resume Test - Methodology
Real CPU usage
Systems of interest (suspended/resumed)
1.2
1.0
0.8
0.6
0.4
workload stop resume
IFLs
0.2
and suspend
0.0
0:00:00 0:01:20 0:02:40 0:04:00 0:05:20 0:06:40 0:08:00 0:09:20 0:10:40 0:12:00 0:13:20 0:14:40 0:16:00 0:17:20 0:18:40 0:20:00 0:21:20 0:22:40 0:24:00 0:25:20 0:26:40 0:28:00 0:29:20 0:30:40 0:32:00 0:33:20 0:34:40
0:00:40 0:02:00 0:03:20 0:04:40 0:06:00 0:07:20 0:08:40 0:10:00 0:11:20 0:12:40 0:14:00 0:15:20 0:16:40 0:18:00 0:19:20 0:20:40 0:22:00 0:23:20 0:24:40 0:26:00 0:27:20 0:28:40 0:30:00 0:31:20 0:32:40 0:34:00 0:35:20
Time (h:mm)
Real CPU usage
Stand by Systems (activated to create memory pressure)
1.2
1.0
0.8
0.6
workload stop
0.4
IFLs
and suspend
0.2
0.0
0:00:00 0:01:20 0:02:40 0:04:00 0:05:20 0:06:40 0:08:00 0:09:20 0:10:40 0:12:00 0:13:20 0:14:40 0:16:00 0:17:20 0:18:40 0:20:00 0:21:20 0:22:40 0:24:00 0:25:20 0:26:40 0:28:00 0:29:20 0:30:40 0:32:00 0:33:20 0:34:40
0:00:40 0:02:00 0:03:20 0:04:40 0:06:00 0:07:20 0:08:40 0:10:00 0:11:20 0:12:40 0:14:00 0:15:20 0:16:40 0:18:00 0:19:20 0:20:40 0:22:00 0:23:20 0:24:40 0:26:00 0:27:20 0:28:40 0:30:00 0:31:20 0:32:40 0:34:00 0:35:20
Time (h:mm)
Real CPU usage
Base Load systems (always up)
1.2
1.0
0.8
0.6
0.4
IFLs
0.2
0.0
0:00:00 0:01:20 0:02:40 0:04:00 0:05:20 0:06:40 0:08:00 0:09:20 0:10:40 0:12:00 0:13:20 0:14:40 0:16:00 0:17:20 0:18:40 0:20:00 0:21:20 0:22:40 0:24:00 0:25:20 0:26:40 0:28:00 0:29:20 0:30:40 0:32:00 0:33:20 0:34:40
0:00:40 0:02:00 0:03:20 0:04:40 0:06:00 0:07:20 0:08:40 0:10:00 0:11:20 0:12:40 0:14:00 0:15:20 0:16:40 0:18:00 0:19:20 0:20:40 0:22:00 0:23:20 0:24:40 0:26:00 0:27:20 0:28:40 0:30:00 0:31:20 0:32:40 0:34:00 0:35:20
Time (h:mm)
12 © 2012 IBM Corporation
- 13. Suspend/Resume Test – What happens with the pages?
Pages in XSTOR Pages in real storage
In the middle of the warmup phase
In the middle of the warmup phase 200000
200000 SOI W. 180000
180000 SOI W. 160000
160000 SOI SDB 140000
140000 SOI SDB
# Pages
120000
# Pages
120000 Standby W.
100000
100000 Standby W.
80000 Standby SDB 80000
60000 Standby SDB 60000
40000 Base Load W. 40000
20000 Base Load SDB 20000
0 Depl. Mgr 0
Pages in XSTOR Pages in real storage
In the middle of the suspend phase In the middle of the suspend phase
200000 SOI W. 200000
180000 SOI W. 180000
160000 SOI SDB 160000
140000 SOI SDB 140000
# Pages
120000
# Pages
Standby W. 120000
100000 Standby W. 100000
80000 Standby SDB 80000
60000 Standby SDB 60000
40000 Base Load W. 40000
20000 Base Load SDB
20000
0 Depl. Mgr
0
Pages in XSTOR Pages in real storage
In the middle of the end phase after resume at the end phase after resume
200000 SOI W. 200000
180000 SOI W. 180000
160000 SOI SDB 160000
140000 SOI SDB 140000
# Pages
120000
# Pages
Standby W. 120000
100000 Standby W. 100000
80000 Standby SDB 80000
60000 Standby SDB 60000
40000 Base Load W. 40000
20000 Base Load SDB
20000
0 Depl. Mgr
0
13 © 2012 IBM Corporation
- 14. Resume Times when scaling guest size
Time to Resume a Linux Guest
90
80
70
60
50
seconds
40
30
20
10
0
2.0 GB / 1 JVM 2.6 GB / 2 JVMs 3.9 GB / 4 JVMs 5.2 GB / 6 JVMs
Guest in GB/# JVMs
pure startup time Resume times
for 6 JVMs
The tests so far have been done with relatively small guests (<1GB memory)
How does the resume time scale when the guest size increases
– Run multiple JVMs inside the WAS instance, each JVM had a Java heap size of 1 GB to ensure that the memory
is really used.
– With the guest size the number of JVMs was scaled
– The size of the guest was limited by the size of a the mod9 DASD used as swap device for the memory image
Resume time includes the time from IPL'ing the suspended guest until successfully perform an http get
Startup time includes only time for executing the startServer command for server 1 – 6 serially with a script
► Resume time was always much shorter than just starting the WebSphere application servers
14 © 2012 IBM Corporation
- 15. Agenda
Introduction
Methods to pause a z/VM guest
WebSphere Application Server JVM stacking
Guest scaling and DCSS usage
Summary
15 © 2012 IBM Corporation
- 16. Objectives
Which setup could be recommended: one or many JVMs per guest?
Expectations
– Many JVMs per guest are related with contention inside Linux (memory, CPUs)
– Many guests are related with more z/VM effort and z/VM overhead (e.g. virtual NICs, VSWITCH)
– Small guests with 1 CPU have no SMP overhead (e.g. no spinlocks)
Methodology
– Use a total of 200 JVMs (WebSphere Application Server instances/profiles)
– Use a pure Application Server workload without database
– Scale the amount of JVMs per guest and the amount of guest accordingly to reach a total of 200
– Use one node agent per guest
16 © 2012 IBM Corporation
- 17. Test Environment
Hardware:
– 1 LPAR on z196, 24 CPUs, 200GB Central storage + 2 GB expanded,
Software
– z/VM 6.1, SLES11 SP1, WAS 7.1
Workload: SOA based workload without database back-end (IBM internal)
Guest Setup:
– no memory overcommitment
guest total virtual
#JVMs #VCPUs total of CPU JVMs
#guests memory size memory size
per guest per guest #VCPUs real : virt per vCPU
[GB] [GB]
200 1 24 24 1 : 1.0 8.3 200 200
100 2 12 24 1 : 1.0 8.3 100 200
50 4 6 24 1 : 1.0 8.3 50 200
20 10 3 30 1 : 1.3 6.7 20 200
CPU Overcommitment
10 20 2 40 1 : 1.7 5.0 10 200
4 50 1 50 1 : 2.1 4.0 4 200
2 100 1 100 1 : 4.2 2.0 2 200 Uniprocessor
1 200 1 200 1 : 8.3 1.0 1 200
17 © 2012 IBM Corporation
- 18. Horizontal versus Vertical WAS Guest Stacking
WebSphere JVMs stacking - 200 JVMs
CPU load [# IFL]
Throughput and LPAR CPU load
4500 16
Throughput [Page Elements/sec]
4000 14
3500 12
3000
10
2500
8
2000
6
1500
start CPU overcommitment + start UP setup
1000 4
guest CPU load < 1 IFL
500 2
0 0
200 190 180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0
#JVMs per guest (Total always 200)
Guests 1 2 4 10 20 200
Throughput (left Y-axis) LPAR CPU load (right Y-
axis)
Impact of JVM stacking on throughput is moderate VM page reorder off
– for guests with 10 JVMs and more
– Minimum = Maximum – 3.5% (> 8 GB memory)
– 1 JVM per guest (200 guests) has the highest throughput
– 10 JVMs per guest (20 guests) has the lowest throughput CPU overcommitment
Impact of JVM stacking on CPU load is heavy – starts with 20 JVMs per guest
– and increases with less JVMs per guest
– Minimum = Maximum - 31%, Difference is 4.2 IFLs
– 10 JVMs per guest (20 guests) has the lowest CPU load (9.5 IFL) Uniprocessor (UP) setup for
– 200 JVM per guest (1 guest) has the highest CPU load (13.7 IFL)
– 4 JVMs per guest and less
Page reorder
no memory overcommmitment
– impact of page reorder on/off on a 4GB guest within normal variation
18 © 2012 IBM Corporation
- 19. Horizontal versus Vertical WAS Guest Stacking - z/VM CPU load
WebSphere JVMs stacking - 200 JVMs WebSphere JVMs stacking - 200 JVMs
z/VM CPU load (CP=User-Emulation, Total=User+System) z/VM CPU load (CP=User-Emulation, Total=User+System)
14 0.8
12 0.7
0.6
CPU Load [#IFLs]
CPU Load [#IFLs]
10
0.5
8
0.4
6 0.3
4 0.2
2 0.1
0 0
200 180 160 140 120 100 80 60 40 20 0 200 180 160 140 120 100 80 60 40 20 0
#JVMs per guest (Total always 200)
#JVMs per guest (Total always 200)
CP (attributed to the Emulation System (CP time not
guest) attributed to the CP (attributed to System (CP time not
guest) guests) attributed to guests)
Which component causes the variation in CPU load?
– CP effort in total between 0.4 and 1 IFL VM page reorder off
• System related is at highest with 1 guest – for guests with 10 JVMs and more
• CP effort guest related is at highest with 200 guest, (> 8 GB memory)
• At lowest between 4 and 20 guests.
CPU overcommitment
– Major contribution comes from the Linux itself (Emulation).
– starts with 20 JVMs per guest
– and increases with less JVMs per guest
z/VM CPU load consist of:
Uniprocessor (UP) setup for
– Emulation → this runs the guest
– 4 JVMs per guest and less
– CP effort attributed to the guest → drives virtual interfaces, e.g VNICs, etc
– System (CP effort attributed to no guest) → pure CP effort no memory overcommmitment
19 © 2012 IBM Corporation
- 20. Horizontal versus Vertical WAS Guest Stacking -Linux CPU load
WebShere JVM stacking - 200 JVMs
Linux CPU load (user, system, steal)
14
12
10
#IFLs 8
6
4
2
0
200 190 180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0
#JVMs per guest (Total always 200)
Guests 1 2 4 10 20 200
User CPU System CPU Steal CPU
Which component inside Linux causes the variation in CPU load?
– Major contribution to the reduction in CPU utilization comes from the user space, e.g. inside WebSphere
Application Server JVM
System CPU
– decreases with the decreasing amount of JVMs per System,
– but increases with the Uniprocessor cases
The amount of CPUs per guest seems to be important!
20 © 2012 IBM Corporation
- 21. virtual CPU scaling with 20 guests (10 JVMs each) and with 200 guests
WebSphere JVMs stacking - 200 JVMs WebSphere JVMs stacking - 200 JVMs
Throughput and LPAR CPU load - 20 guests, 10 JVMs each Throughput and LPAR CPU load - 200 guests, 1 JVM each
CPU load [# IFL] CPU load [# IFL]
Throughput [Page Elements/sec]
4,000 14 4,000 16
Throughput [Page Elements/sec]
3,500 12 3,500 14
3,000 3,000 12
10
2,500 2,500 10
8
2,000 2,000 8
6 1,500 6
1,500
1,000 4 1,000 4
500 2 500 2
0 0 0 0
1 (1:1) 2 (1:1.7) 3 (1:2.5) 1 (1:8.3) 2 (1:16.7)
# virtual CPUs (phys : virtual) # virtual CPUs (phys : virtual)
LPAR load z/VM load: Emulation Throughput LPAR load z/VM load: Emulation Throughput
Scaling the amount of virtual CPUs of the guests at the point of the lowest LPAR CPU load
– Impact of throughput is moderate (Min = Max – 4.5%), throughput with 1 vCPU is the lowest
– Impact on CPU load is high, 2 virtual CPUs provide the lowest amount of CPU utilization
– The variation is caused by the emulation part of the CPU load => Linux
It seems that the amount of virtual CPUs has a severe impact on the total CPU load
– CPU overcommitment level is one factor, but not the only one!
– WebSphere Application Server runs on Linux better with 2 virtual CPUs in this case as long as the CPU
overcommitment level is not too excessive
21 © 2012 IBM Corporation
- 22. Agenda
Introduction
Methods to pause a z/VM guest
WebSphere Application Server JVM stacking
Guest scaling and DCSS usage
Summary
22 © 2012 IBM Corporation
- 23. Horizontal versus Vertical WAS Guest Stacking – DCSS vs Minidisk
WebSphere JVM stacking WebSphere JVM stacking
Throughput with DCSS vs Minidisk LPAR CPU load with DCSS vs Minidisk
4,000
14
3,500
12
3,000
10
Throughput
2,500
2,000 8
#IFLs
1,500 6
1,000 4
500 2
0
0
200 190 180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0
200 190 180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0
#JVMs per guest (Total always 200) #JVMs per guest (Total always 200)
Guests 1 2 4 10 20 200 Guests 1 2 4 10 20 200
DCSS shared Minidisk DCSS shared Minidisk
DCSS or shared minidisks?
Impact on throughput is nearly not noticeable
Impact on CPU is significant (and as expected)
– For small numbers of guests (1 – 4) it is much cheaper to use a minidisk than a DCSS (Savings: 1.4 – 2.2 IFLs)
– 10 guest was the break even
– With 20 guests and more, the environment with the DCSS needs less CPU (Savings: 1.5 – 2.2 IFLs)
23 © 2012 IBM Corporation
- 24. Agenda
Introduction
Methods to pause a z/VM guest
WebSphere Application Server JVM stacking
Guest scaling and DCSS usage
Summary
24 © 2012 IBM Corporation
- 25. Summary – Pause a z/VM guest
“No work there” is not sufficient that a WebSphere guest becomes dormant
– probably not specific to WebSphere
Deactivating the guest helps z/VM to identify guest pages which can be moved out to the paging devices to
use the real memory for other guest
– two Methods
• Linux suspend mechanism (hibernates the Linux)
• z/VM stop command (stops the virtual CPUs)
– Allows to increase the possible level of memory and CPU overcommitment
Linux suspend mechanism
– takes 10 - 30 sec to hibernate for a 850 MB guest, 20 to 30 sec to resume (1 – 5 GB guest)
– controlled halt
– the suspended guest is safe!
z/VM stop/begin command
– system reacts immediately
– guest memory is lost in case of an IPL or power loss
– still some activity for virtual devices
There are scenarios which are not eligible for guest deactivation (e.g. HA environments)
For additional Information to methods to pause a z/VM guest
– http://www.ibm.com/developerworks/linux/linux390/perf/tuning_vm.html#ruis
25 © 2012 IBM Corporation
- 26. Summary – scaling JVMs per guest
Question: One or many WAS JVMs per guest?
Answer:
In regard to throughput: it doesn't matter
In regard to CPU load: it matters heavily
– Difference between maximum and minimum is about 4 IFLs(!) at a maximum total of 13.7 IFLs
– The variation in CPU load is caused by User space CPU in the guest
– z/VM overhead is small and has its maximum at both ends of the scaling
– 2 virtual CPUs per guest provide the lowest CPU load for this workload
Sizing recommendation
– Do not use more virtual CPUs are required
– If the CPU overcommitment level becomes not too high, use at minimum two virtual CPUs per WebSphere
system
DCSS vs shared disk
– There is only a difference in CPU load, but that reaches the area of 1 - 2 IFLs
– For less than 10 guests a shared disk is recommend
– For more than 20 guests a DCSS is the better choice
For additional Information
– WebSphere JVM stacking: http://www.ibm.com/developerworks/linux/linux390/perf/tuning_vm.html#hv
– page reorder: http://www.vm.ibm.com/perf/tips/reorder.html
26 © 2012 IBM Corporation
- 28. Stop/Begin Test – What happens with the pages?
Pages in XSTOR Pages in real storage
In the middle of the warmup phase In the middle of the warmup phase
200000
200000 SOI W. 180000
180000 SOI W. 160000
160000 SOI SDB 140000
140000 SOI SDB
# Pages
120000
# Pages
120000 Standby W.
100000
100000 Standby W.
80000 Standby SDB 80000
60000 Standby SDB 60000
40000 Base Load W. 40000
20000 Base Load SDB 20000
0 Depl. Mgr 0
Pages in XSTOR Pages in real storage
In the middle of the suspend phase In the middle of the suspend phase
200000 SOI W. 200000
180000 SOI W. 180000
160000 SOI SDB 160000
140000 SOI SDB 140000
# Pages
120000
# Pages
Standby W. 120000
100000 Standby W. 100000
80000 Standby SDB 80000
60000 Standby SDB 60000
40000 Base Load W. 40000
20000 Base Load SDB
20000
0 Depl. Mgr
0
Pages in XSTOR
In the middle of the end phase after resume Pages in real storage
200000 SOI W.
180000
at the end phase after resume
SOI W.
160000 SOI SDB 200000
140000 SOI SDB 180000
160000
# Pages
120000 Standby W.
100000 Standby W. 140000
# Pages
80000 Standby SDB 120000
60000 Standby SDB 100000
40000 Base Load W. 80000
20000 Base Load SDB 60000
0 Depl. Mgr 40000
20000
0
28 © 2012 IBM Corporation