More Related Content Similar to High Availability with Novell Cluster Services for Novell Open Enterprise Server on Linux (20) High Availability with Novell Cluster Services for Novell Open Enterprise Server on Linux1. High Availability with Novell ®
Cluster Services
for Novell Open Enterprise Server on Linux
Kent Boogert Charles Gonzales
kboogert@novell.com cgonzales@novell.com
2. Agenda
• Storage Foundation
• High Availability Concepts
• Novell Cluster Services 1.8 Architecture
®
• Installing NCS
• Cluster Resources
• NCS management tools
2 © Novell, Inc. All rights reserved.
4. Novell Open Enterprise Server
®
Storage Foundation
Management
Network
eDir NFS
Samba
PDC NCP iManager
DFS Freeze
thaw
mon Backup & restore / Replication / SRM L
A
N
Linux vfs (+efl)
NCP redirector FS CLI
CIMOM scripting
Reiser SMI-S
NCS CFS Ext3 NSS schema
NetWare
EVMS MM Policy
I/O
fence engine
SMI-S client Disk: FireWire, SCSI, FC, iSCSI, SATA, SAS
Storage Network
4 © Novell, Inc. All rights reserved.
6. Fault Tolerant Environment
Server Cluster
Storage Array
Storage Array
Ethernet Fibre Channel
Ctrl 1
LUN LUN LUN …
0 1
LAN SAN Storage Array
Fabric Fabric
6 © Novell, Inc. All rights reserved.
8. Fault Tolerant Environment
Server Cluster
Storage Array
Storage Array
Ethernet Fibre Channel
Ctrl 1
LUN 0 LUN 1 LUN …
LAN SAN Storage Array
Fabric Fabric
8 © Novell, Inc. All rights reserved.
10. Fault Tolerant Environment
Server Cluster
Dual NICs Storage Array
Storage Array
Ethernet Fibre Channel
Ctrl 1
LUN LUN LUN …
0 1
LAN SAN Storage Array
Fabric Fabric
10 © Novell, Inc. All rights reserved.
11. Eliminating Single Soints of Failure
• Servers
• Local area network
• Storage area network
11 © Novell, Inc. All rights reserved.
12. Fault Tolerant Environment
Server Cluster
Dual NICs Dual Storage Array
HBAs
Storage Array
Ethernet Fibre Channel
Ctrl 1
Ctrl 2 LUN LUN LUN …
0 1
LAN SAN Storage Array
Fabric Fabric
12 © Novell, Inc. All rights reserved.
13. Eliminating Single Points of Failure
• Servers
• Local area network
• Storage area network
• Cluster communication
13 © Novell, Inc. All rights reserved.
14. LAN Heartbeat Protocol
Broadcast(s)
LAN
Unicast(s) back Unicast(s) back Unicast(s) back
to master to master to master
0 1 2 3
Master Slave Slave Slave
SYS SYS SYS SYS
SAN
Sharable for
clustering
14 © Novell, Inc. All rights reserved.
15. SAN Split Brain Detection
LAN
Unicast(s) back Unicast(s) back Unicast(s) back
to master to master to master
0 1 2 3
Master Slave Slave Slave
SYS SYS SYS SYS
SBD disk I/O
SAN
Cluster
Partition (SBD)
Sharable for (Mirrored)
clustering
15 © Novell, Inc. All rights reserved.
17. Directory-based Configuration
• eDirectory for replicated cluster configuration
™
– NCS:NetWare Cluster ®
> Name is misleading, but we haven't changed it
– NCS:Cluster Resource
– NCS:Resource Template
– NCS:NCP Server ™
> Don't confuse with NCP Server
– NCS:Volume Resource
17 © Novell, Inc. All rights reserved.
18. eDirectory Object Relationships ™
Cluster object
= kcb_cluster.novell VIRTUAL NCP SERVER OBJECT
=KCB_CLUSTER P69_SERVER
NCS:NetWare Cluster object class
NCP SERVER, SERVER
NCS:Network Address :(cluster address)
Network address = (resource address i.e. 69)
NCS:Network cluster = kcb_cluster.novell
CLUSTER NODE OBJECT NCS:Volumes = kcb_cluster_VOL1.novell
= srv01_kcb_cluster.novell
NCS:NCP SERVER object class
NCS:GIPC NODE Number NCP SERVER OBJECT =
NCS:Network Address (Server Address) Srv01.novell
NCS:NCP Server
NCP SERVER, SERVER object class
network address: Server address
CLUSTER ROOL RESOURCE OBJECT NCS:NetWare Cluster = not set!!
= P69_Server.kcb_cluster.novell NCS:Volumes = not set!!
NCS:Volume Resource object class
NCS:CRM Preferred Nodes
NCS:NCP Server = (KCS_CLUSTER_P69_SERVER.novell
NCS:Volumes = (kcb_cluster_VOL1.novell)
NCS:CLUSTER RESOURCES NCP VOLUME OBJECT =
kcb_cluster_VOL1.novell
NCS:CRM Preferred Node
VOLUME RESOURCE
NCS:RESOURCE TEMPLATE
HOSTSERVER = KCB_CLUSTER_P69_SERVER.novell
NCS:CRM Preferred Node HOSTSERVER RESOURCE name = VOL1
Linux NCP Mount Point = NSS/media/nss/vol1
18 © Novell, Inc. All rights reserved.
19. Directory-based Configuration
• eDirectory for replicated cluster configuration
™
• Configuration is in cluster container
– Files to know who I am
> /etc/opt/novell/ncs/clstrlib.conf
> /etc/opt/novell/ncs/nodename
• Cluster configuration daemon (ncs-configd)
– Syncs configuration between LDAP and local files
• Cluster master declares current configuration
19 © Novell, Inc. All rights reserved.
20. Group Interprocess Communication
(GIPC)
• Heartbeat, membership, and multi-cast protocols
– Common parameters and tuning concepts
panning clusterid 56867377
heartbeat rate_usecs 1000000
censustaker tolerance 8000000
sequencer master_watchdog 1000000
sequencer slave_watchdog 8000000
sequencer retrans_max 30
• Linux kernel module creates two raw sockets
– /proc/net/raw
• Linux Ethereal/Wireshark offers basic packet decoder
– Good for tracing master / slave heartbeat packets
20 © Novell, Inc. All rights reserved.
21. Split Brain Detector
(SBD)
• Nodes coordinate membership via shared disk
– Linux kernel module does direct I/O
> e.g. /dev/sda1 or /dev/evms/cluster.sbd
• Nodes locate matching SBD given cluster name
– SBD partition is created at cluster creation
> Linux device special filename is unimportant
• See man sbdutil for Linux SBD command line utility
– e.g. sbdutil -v
21 © Novell, Inc. All rights reserved.
22. Cluster Resource Manager
(CRM)
• Manages cluster-wide resource states
– Linux kernel module executes distributed FSM
• Cluster resource daemon (ncs-resourced)
– Forks once per resource action
– Forks again for load, unload, and monitor script
> stdout & stderr redirected to /var/opt/novell/log/ncs/*.out
– Returns script status to kernel
• Cluster resource scripts run from local files
– /bin/bash is default interpreter
– Shell functions in /opt/novell/ncs/lib/ncsfuncs
22 © Novell, Inc. All rights reserved.
23. Load and Unload Script Example
Cluster Pool Resource Load script
#!/bin/bash
. /opt/novell/ncs/lib/ncsfuncs
exit_on_error nss /poolact=AUTO_POOL_01
exit_on_error ncpcon mount AUTO_VOL_012=253
exit_on_error ncpcon mount AUTO_VOL_011=254
exit_on_error add_secondary_ipaddress 151.155.189.131
exit_on_error ncpcon bind --ncpservername=CGAO_SP3_PR6_CLUSTER_AUTO_POOL_01_SERVER
--ipaddress=151.155.189.131
exit 0
Cluster Pool Resource Unload script
#!/bin/bash
. /opt/novell/ncs/lib/ncsfuncs
ignore_error ncpcon unbind --ncpservername=CGAO_SP3_PR6_CLUSTER_AUTO_POOL_01_SERVER
--ipaddress=151.155.189.131
ignore_error del_secondary_ipaddress 151.155.189.131
ignore_error nss /pooldeact=AUTO_POOL_01
exit 0
Cluster Pool Resource Monitor script
#!/bin/bash
. /opt/novell/ncs/lib/ncsfuncs
exit_on_error status_fs /dev/evms/AUTO_POOL_01 /opt/novell/nss/mnt/.pools/AUTO_POOL_01 nsspool
exit_on_error status_secondary_ipaddress 151.155.189.131
exit_on_error ncpcon volume AUTO_VOL_012
exit_on_error ncpcon volume AUTO_VOL_011
exit 0
23 © Novell, Inc. All rights reserved.
24. Cluster Resources
Cluster documentation for most OES2 services
– AFP – mySQL
– Certificate Server – NetStorage
– CIFS – QuickFinder
– DFS – Samba
– DHCP server – DST (shadow volumes)
> On Linux Posix – Linux Posix volumes
> On Novell Storage Services ™
– NCP volumes
™
– DNS Server – NSS
– iFolder 3.8
– iPrint
24 © Novell, Inc. All rights reserved.
25. Cluster Resource Templates
• ArkManager • iPrint
• DHCP • MySQL
• DNS • Samba
• Generic_FS • XenLive*
• Generic_IP • Xen*
• iFolder
25 © Novell, Inc. All rights reserved.
26. Cluster Management Agent
(CMA)
• Common cluster management interface
– Linux kernel module exports management files
> /admin/Novell/Cluster
• Enables iManager and cluster command line interface
– Built on Novell admin file system (adminfs)
®
• Direct access to XML formatted cluster state
– e.g. /admin/Novell/Cluster/ResourceState.xml
• See NCS for Linux Administration Guide section 8.17
– http://www.novell.com/documentation/oes2/clus_admin_lx/data/
h4hgu4hs.html
26 © Novell, Inc. All rights reserved.
27. Handling Fault Conditions
Split brain
– Server isolated from the LAN
Fatal SAN error
– Server isolated from the SAN
Poison pills
– panic
> /proc/sys/kernel/panic
– run eDirectory NCS:NodeIsolationScript
™
hangcheck_timer
27 © Novell, Inc. All rights reserved.
28. Enterprise Volume Management System
(EVMS)
• EVMS is an extensible host-based disk volume
manager
– Extended by Novell via shared library plugins
®
> NetWare segment and cluster managers
®
• EVMS Cluster Extension (ECE)
– Manage any node's local plus shared storage
– Novell Cluster Services plugin
> http://evms.sourceforge.net/cluster
• Cluster awareness similar to NetWare media manager
– e.g. Provides safe online file system expansion
• Cluster / globally unique persistent device naming
28 © Novell, Inc. All rights reserved.
30. Installing Novell Cluster Services ®
YaST based installation
– YaST2 select Open Enterprise Server
> OES Install and Configuration
Local or remote
– Specify how NCS will access eDirectory via LDAP
™
New Cluster
– Enter cluster FDN, IP address and optional device for SBD
Existing Cluster
– Enter cluster FDN
30 © Novell, Inc. All rights reserved.
32. Cluster Migrations?
Rolling cluster conversion
– Cluster resources remain available
Three simple steps
– Decommission NetWare on server ®
– Install Novell Open Enterprise Server Linux onto server
®
– Add Linux server to existing cluster
Rollback to NetWare
– Convert Linux server back to NetWare
Sequential and parallel options
– Convert one-by-one or many at once
32 © Novell, Inc. All rights reserved.
33. Automatic Script Translation
During cluster conversion...
– NetWare resources failover to Linux
®
– ncs-resourced translates scripts on-the-fly
Translates NetWare commands
– e.g. cluster cvsbind add vserver 10.0.0.0
ncpcon bind –ncpservername=vserver --ipaddress=10.0.0.0
Committing the conversion
cluster convert preview [all | (resource_name)]
cluster convert commit
> script translation saved to eDirectory
™
> update cluster revision number = 282
33 © Novell, Inc. All rights reserved.
34. Rules for Mixed Clusters
• Online storage reconfiguration is not supported
• Can't add NetWare nodes using deployment manager
®
• Resources created on Linux won't run on NetWare
34 © Novell, Inc. All rights reserved.
35. Cluster Management Tools
cluster
– Perl-based command line
iManager snapins
– Common interface for NetWare and Linux
®
ncs-emaild
– User-space cluster event email daemon
sbdutil
– Split brain detector utility
35 © Novell, Inc. All rights reserved.
36. Tips, Tricks, and Troubleshooting
• Getting output
– grep ncs- /var/log/messages
– grep CLUSTER /var/log/messages
– sbdutil -v
– cluster stats display
36 © Novell, Inc. All rights reserved.
37. Tips, Tricks, and Troubleshooting
• Getting MORE output
– /admin/Novell/Cluster/EventLog.xml
> echo -n “trace crm on” > /proc/ncs/cluster
> Can change in ldncs
– export NCSCONFIGD=1
– export NCSRESOURCED=1
37 © Novell, Inc. All rights reserved.
38. Tips, Tricks, and Troubleshooting
• SBD problems
– Use sbdutil -f to search for the clusters SBD
• Keep the server from rebooting after panic
– echo -n XX > /proc/sys/kernel/panic
» Where XX is number of seconds to delay before rebooting
» 0 disables automatic rebooting after a panic
– echo -n x >/proc/sys/kernel/panic_on_oops
> When X is 0; disable reboot
> When X is 1; enable reboot
38 © Novell, Inc. All rights reserved.
39. More Tips and Tricks
• Resources going comatose
> Review /var/opt/novell/log/ncs/<resource>.load.out
> Review /var/opt/novell/log/ncs/<resource>.unload.out
– Script can be run outside of cluster. Be careful
– When lanched from resourced it may be a different environment
> e.g. path is different; therefore include full path to exe.
• Update schema prior to install
– Be sure to patch!!!!
– Yast2
> Open Enterprise Server section - Schema tool
• Maintenance mode
– cluster maintenance on/off
> Will ignore lose of heartbeat. Use when doing lan work, upgrades, etc.
39 © Novell, Inc. All rights reserved.
40. Particularly Hot
• Deleting nodes
– Delete all the server objects
– On master node execute
> cluster exec “/opt/novell/ncs/bin/ncs-configd.py -init”
• Console commands
– Cluster help
– Most non-configuration items can be done from command line
40 © Novell, Inc. All rights reserved.
42. Unpublished Work of Novell, Inc. All Rights Reserved.
This work is an unpublished work and contains confidential, proprietary, and trade secret information of Novell, Inc.
Access to this work is restricted to Novell employees who have a need to know to perform tasks within the scope
of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified,
translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of Novell, Inc.
Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General Disclaimer
This document is not to be construed as a promise by any participating company to develop, deliver, or market a
product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in
making purchasing decisions. Novell, Inc. makes no representations or warranties with respect to the contents
of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any
particular purpose. The development, release, and timing of features or functionality described for Novell products
remains at the sole discretion of Novell. Further, Novell, Inc. reserves the right to revise this document and to
make changes to its content, at any time, without obligation to notify any person or entity of such revisions or
changes. All Novell marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc.
in the United States and other countries. All third-party trademarks are the property of their respective owners.