Contenu connexe Similaire à Debugging and Configuration Best Practices for Oracle Linux (20) Debugging and Configuration Best Practices for Oracle Linux1. 1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
2. Debugging and Configuration
Best Practices for
Oracle Linux
Greg Marsden
Senior Director, Linux and Virtualization
2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
3. Agenda
Key Linux Tips and Tricks
Common Issues
Diagnostic Tools and Use Cases
Do it Yourself Debugging
Ksplice in the Datacenter
3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
4. Tips and Tricks:
Key Points
4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
5. Key Linux Tips and Tricks
Kernel Tuning: Oracle
Preinstall RPM
Best Performance
Diagnostic Tools: and Reliability
kdump and oswatcher
5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
6. Oracle Preinstall Package and Templates
Configure Oracle Products Automatically
oracle-rdbms-server-11gR2-preinstall-1.0.6.el6.x86_64.rpm
Per-Product Preconfiguration Package
– Based on Validated Configuration‟s real world stack testing
– Includes Product Release Notes recommendations
– Installs necessary dependencies and kernel tuning parameters
– Individual for each Oracle product
Oracle VM Template for Oracle RDBMS Server
– Production-ready, installed virtual machine templates from eDelivery
6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
7. System Diagnostics
Critical Diagnostics Software should run at all times
oswatcher utility: Install and leave running to collect over-time
information about system activity.
serial console or netconsole to remotely monitor system activity in
the case of a disk, network or system outage.
kexec crash collection utilities to gather forensic information from
malfunctioning systems.
7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
8. Tips and Tricks:
Memory Management
8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
9. “Help! my system has 250 GB of RAM I‟m running out of
memory! My consultants are telling me we can‟t scale
with a 120GB SGA and this many connections, but I
can‟t fit any more RAM in this system.”
Anonymous DBA
Oracle User
9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
10. Issue: Not using Hugepages
Frequent Issue I found the following:
.
• Symptom: Out of Memory Errors, slow 13:09:19 57591060k free 159 client connections
performance. Detected via 13:26:01 26189944k free 1826 client connections
oswatcher. 13:32:31 15547144k free 2024 client connections
13:57:00 467048k free 2037 client connections
• Cause: SGA mapped in 4k pages (here is where we begin swapping memory to disk)
instead of 2MB
I also found this:
• Solution: Use Hugepages
zzz ***Fri Aug 9 13:23:22 PDT 2011
• Hugepages are faster.
MemTotal: 250 GB
• Hugepages are “pinned” and won‟t MemFree: 464 MB
be swapped. PageTables: 112 GB
10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
11. Performance with Hugepages
Without Hugepages With Hugepages
• 200 Connections to a 12.9GB SGA • 200 Connections to a 12.9GB SGA
• Before DB Startup Pagetables: 7400 kB • Before DB Startup PageTables: 7748 kB
• After DB Startup Pagetables: 652900 kB • After DB Startup Pagetables: 21288 kB
• After 200 PQ Slave run query • After 200 PQ slaves run query
Pagetables: 6189248k Pagetables: 80564 kB
• Time to complete: 00:10:23.60 • Time to complete: 00:00:18.77
11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
12. Hugepages and Transparent Hugepages
Performance for DB and Middleware Applications
“Regular” Hugepages [Ref. Doc ID 749851.1]
– Reduce footprint of individual Oracle database connections.
– Increase performance and scalability.
– Requires manual tuning after SGA changes, and does not work with AMM.
Transparent Hugepages
– Transparent hugepages do not help the RDBMS use case.
– Auto-allocate hugepages for large memory allocations. Great for
Java/middleware/applications.
– New for UEK and OL6!
12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
13. Issue: Slow Performance at 95% RAM
OL5 Specific issue with large memory allocation
Symptoms
– System is swapping and shows low free/cached memory
– Reduced system performance
Cause: Usually the kernel is hogging CPU in try_to_free_pages from
pagecache, inactive lists.
Solution
– Ensure you are running a shrink_zone patched kernel: UEK, OL6, or
OL5+BUG6086839
– If system is swapping but performance is OK, get more RAM.
13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
14. Issue: System Swapping with Free Memory
NUMA Specific Problem
Symptom: System starts to swap while reporting free RAM
– vmstat reports free memory.
– dmesg has “order 5 allocation failed” messages.
– If <5 order allocations are failing, there are larger issues
Cause: Memory Fragmentation. On NUMA systems, caused by
fragmentation of node-local memory for kernel applications.
Solutions:
– Disable NUMA
– Decrease MTU size if using jumboframes
14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
15. Memory Accounting
What is Linux Doing With My Memory?
Free = Cached + Free
– All free space on Linux is used for pagecache.
– This behavior cannot be disabled.
Process Shared Memory is hard to find in Linux
– RSS double counts shared memory, Total includes unmapped pages.
– Use /proc/<pid>/smaps to see real process memory usage.
cgroups: New features in the latest kernels let you restrict RAM
– Useful to throttle pagecache use by backup processes
15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
16. Swap: What is it good for?
Tuning Swap Space
Swap is a highly contentious topic on
Linux
– Benefit: Allows “room to grow” for
inadequately sized systems.
– Drawback: Much slower than memory
access, often makes problems worse.
Recommendation: Use swap, but
ensure IO to swap disk is kept close to
zero.
vmstat output
16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
17. Tips and Tricks:
General Recommendations
17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
18. Other Configuration Trouble
Assorted Common Configuration Issues
Use UUID to Mount System Disks
– Symptom: System panics after upgrade
– Cause: New hardware, drivers, or kernel reorders device discovery
– Cautions: May not work with LVM snapshot
NFS Locks Not Released on Reboot
– Cause: kernel and DNS have different hostnames
– Solution: ensure kernel hostname is fully qualified. See BUG 3156942.
Cluster Reboots with OCFS2
– Cause: Network or Disk outages can cause OCFS2 to fence nodes
– Solution: Ensure OCFS2 timeouts are greater than storage/network failover
timeouts. Defaults may be too short for o2cb heartbeat.
18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
19. Performance Tuning Kernel Parameters
System Scheduling
vm.swappiness
100: Force aggressive swapping
0: Insurance against a backup process hogging all system memory
Network Protocol Buffers
net.core.wmem_default/max: Buffer size for outgoing network packets.
net.core.rmem_default/max: If these values are set too small, system may discard TCP packets
Memory Management
vm.dirty_ratio: encourage frequent pagecache writeback
vm.lowmem_reserve_ratio/vm.min_free_kbytes: reserve physical memory for kernel allocations
19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
20. Bugs Fixed in Enterprise Linux
Oracle Linux 5 Bug Fixes
Oracle finds and fixes critical
bugs in Enterprise Linux.
– Red Hat Compatible Kernel vs.
Oracle-Modified kernel
– Install the Compatible Kernel for
bug-for-bug compatibility with RHEL
Patches required for correct
Oracle product operation
20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
21. Oracle Patches Linux
Specially Tuned Linux Kernels for Customer Requirements
Staying up-to-date with your Linux distribution is very important.
Bug-Fixed Oracle Linux Kernel
UEK: Unbreakable Enterprise Kernel
– Top Performing Kernel. World Record TPCC Benchmark.
– Provides OL6 performance on OL5 systems.
Backporting of fixes is a temporary solution, not a permanent one.
– Always plan to update or ksplice to the latest kernel version.
21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
22. UEK: Modern Linux for Oracle
Fast, Modern, Reliable
Get the latest in performance and
features from Linux, tested by Oracle.
– All new kernel, optimized for Oracle.
Stay closer to mainline Linux with
patches to improve performance for
Oracle workloads.
– All patches are open source and
submitted to mainline Linux
– Patches provided via RPM and via ksplice
– World Record TPCC Bencmark March
‟12.
22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
23. Tips and Tricks:
Diagnostics
23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
24. Issue: Post-Event Diagnostics
What to do after a crash or hang?
Hard hangs
– Panic, OOPS, nmi_watchdog
– “Spontaneous Reboot”
Brownouts
– Performance Degredation
Cluster Scenarios
– Network or Disk may have gone away, triggering the fence
– Need to maintain crash data in the event of loss of net/disk
– Ensure timeouts (like OCFS2) are set correctly
24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
25. Two Kinds of Critical OS Logging
Continuous OS Logging Panic/Hang Event Logging
oswatcher continuous logging collected serial console or netconsole should always
timestamped snapshots of system commands: be set up for any production system. No
ps, top exceptions.
slabinfo, meminfo Consoles also preserve sysrq data.
vmstat, mpstat, iostat kdump system memory image collection.
Other tools can be employed as well, like sar
or collectl.
25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
26. Console Logs
Finding Faults if Disk or Networking Fail
Kernel Messages may not be available after a crash
– Serial Consoles are proven technology for preserving console output
How to capture Serial output:
– Reliable ways to capture serial output:
ILOM virtual console
Serial-Over-Lan BIOS config
Inexpensive DB9-USB converter or Serial Concentrator
– Unreliable ways to capture serial output:
Physically attached terminal with „setterm –blank 0‟ and system not configured to reboot
netconsole (can be difficult to configure, and subject to network outages)
Things to check:
– Ensure Baud Rate is high enough (not 9600 baud)
– For Virtual Console, ensure console history is setup to capture large amount of output
26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
27. Keyboard and Console Diagnostics
SysRq Key Combinations for Diagnostics
Magic SysRq Key How to Invoke Magic SysRq…
M: dump system memory statistics Console:
Alt + Sys Rq + <cmd>
P, W: dump the stack for all processors Serial Console:
<Break> <cmd>
T: dump the kernel stack trace for all processes Command Line:
echo t > /proc/sysrq-trigger
C: Immediately cause a system crash Oracle VM dom0:
xm sysrq <cmd> <domain ID>
S … U… B: Emergency Sync all disks, Unmount disks, Ensure kernel.sysrq = 1
reboot.
Some of these operations (like stack trace) dump a lot of These operations take full priority in the kernel. Do
data (1MB or more!). not run them in your monitoring scripts, use
carefully!
27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
28. Diagnostic/Destructive Kernel Parameters
Enable Keyboard- and Console-based Debugging
kernel.sysrq=1
Always have this set to enable debug commands
System-wide Events
panic_on_oom: Panic for Out of Memory condition
(Alternative would be to kill the high memory process)
panic_on_oops: Panic for system problems
(off: some modules may survive a panic, but system state is inconsistent)
Per-Process Events
hung_task_timeout: Enable warning if process not scheduled for (timeout) second.
Can cause a lot of log messages, not usually useful
hung_task_panic: Cause a stack trace and system panic if the timeout is hit
Can be useful for debugging. Not good to set by default.
28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
29. Crash Kernel Memory Snapshots
Set Your System to Automatically Dump Core
kdump: uses Linux kexec function to save kernel stacks after a panic
– Only way to get diagnostic data if disk or network are not available.
– Reboots the system into a protected memory area to save crashed kernel
Very common errors:
– Not testing kdump: Requires specific memory tuning (crashkernel=) and
also requires specific HBA or network drivers
– Have dedicated space for crash dumps. Preferably not in your root
partition. Remember, vmcore == physical memory size.
– Local disk is faster and more reliable than network dumps.
– Use gzip or `makedumpfile` to compress cores prior to upload
29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
30. Reading a Kernel Core
Using the Crash Utility
Crash `bt` backtrace or SysRq-T stack
– Get debug symbols from oss.oracle.com
Red flags:
– Many processes in D state (IO)
– Many processes in same kernel routine
(contention?)
Caution: Stack traces can be 1M or
greater. Don‟t do this frequently.
dmesg output after SysRq-T
30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
31. Diagnostic Tools for Brownouts
Getting more out of your diagnostic tools
strace -ttT: Diagnose slow processes
– Automatically timestamp system calls
– Useful for diagnosing specific process syscall latency
– Also helpful to determine if a problem is in kernel or usermode
Crash utility on Virtual machines
– `xm dump-core` takes a noninvasive kernel snapshot of a system
– Provides memory, stack traces, and kernel logs
31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
32. Tips and Tricks:
Ksplice
32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
33. Issue: Old Kernels
Newer Kernel Releases Fix Your Bugs
Symptom: Customer systems are encountering known/fixed issues
– Examples: tcp window_size, shrink_zone, etc.
– Use new kernels for new features: NFSv4, dtrace, btrfs.
Cause: Older kernels are not „stable‟. New kernels fix bugs.
Solutions:
– Implement a periodic update schedule for kernel and OS packages, or…
– Use ksplice to stay up to date with patches
33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
34. Ksplice: Rebootless Kernel Patching
Zero Downtime Patching for Bugs and Security Updates
Ksplice keeps your system up to date
– Integrated with ULN
– Now available in online and offline modes
Using Ksplice for Diagnostics and Patching
– Real-World NFS Example
34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
35. Summary
Key Linux Tips and Tricks
– kexec and oswatcher
Common Issues
– memory management and configuration
Diagnostic Tools and Use Cases
Ksplice in the Datacenter
35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
36. ORACLE LINUX PAVILION
Visit our partners and
don‟t miss these events
sponsored by QLogic
Smoothie Bar on
Monday, Oct 1st, 2:30-
5:30pm
Ice Cream Social on
Wednesday, Oct 3rd, 1-
2pm
36 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
37. Oracle Linux Sessions
Tuesday, Oct 2nd
Oracle Linux TRACK SESSIONS
General Session: Oracle Linux Strategy
and Roadmap
GEN8726 10:15 AM Moscone South - 103
Speakers: Wim Coekaerts and Monica
Kumar, Oracle
Top Technical Tips for Automatic and
Secure Oracle Linux Deployments
CON8731 Speakers: Lenz Grimmer, Oracle, Martin 11:45 AM Moscone South - 270
Breslin, SEI Global, Ed Bailey,
Transunion
37 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
38. Oracle Linux Sessions
Wednesday, Oct 3rd
Oracle Linux TRACK SESSION
Why Switch to Oracle Linux?
Moscone South -
CON8729 Speakers: Monica Kumar, Mike 3:30 PM
270
Radomski, SUNY
HANDS ON LABS
HOL9383 Oracle Linux Package 10:15 AM Marriot Salon
Management: Configuring and 14/15 YB level
Enabling Services
HOL9384 Oracle Linux Storage 11:45 AM Marriot Salon
Management with LVM and 14/15 YB level
Device-Mapper
38 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
39. Oracle Linux Sessions
Thursday, Oct 4th
HANDS ON LABS
Oracle Linux Package
Marriot Salon 14/15
HOL9383 Management: Configuring and 12:45 PM
YB level
Enabling Services
Oracle Linux Storage
Marriot Salon 14/15
HOL9384 Management with LVM and Device- 2:15 PM
YB level
Mapper
39 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
40. NEW: Oracle Linux Curriculum Footprint
Oracle Linux Training from Oracle University
Unix/Linux Essentials Oracle Linux System Administration
Instructor-led and Live virtual Instructor-led and Live virtual
This Oracle Linux System Administration course teaches
you all the essential system administration skills and includes
key information specific to Oracle Linux: Unbreakable Enterprise
Kernel, Ksplice, ULN, and other key features
Visit:
oracle.com/education/linux
40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
41. Resources
Join our communities
@ORCL_Linux Facebook.com/ Blogs.oracle.com Oracle Linux YouTube.com/
OracleLinux /linux Experts Group oraclelinuxchannel
Visit
Oracle.com/linux
Download for FREE
edelivery.oracle.com/linux
41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
43. The preceding is intended to outline our general product direction. It is intended
for information purposes only, and may not be incorporated into any contract.
It is not a commitment to deliver any material, code, or functionality, and should
not be relied upon in making purchasing decisions. The development, release,
and timing of any features or functionality described for Oracle‟s products
remains at the sole discretion of Oracle.
43 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
44. 44 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
45. 45 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle
Notes de l'éditeur Thanks for attending.For additional information and resources, you can visit us at: oracle.com/linux and join our social media sites to get day to day updates