Clustering is a new feature introduced in AOS 8.0 that enables seamless roaming of clients between APs, hitless client failover and load balancing of users across Mobility Controllers in the cluster. This solution provides the configuration required to create a cluster of Mobility Controllers that are managed by the same Mobility Master.
Check out the webinar recording where this presentation was used:
https://community.arubanetworks.com/t5/Wired-Intelligent-Edge-Campus/Airheads-Tech-Talks-Advanced-Clustering-in-AOS-8-x/td-p/506441
2. 2
AGENDA
Cluster load balancing (AP and client)
AP termination in cluster
AP move
Authorization
Cluster troubleshooting
3. 3
What is Clustering?
Clustering is a combination of multiple managed devices working together to provide high
availability to all clients, ensuring service continuity when a failover occurs.
4. 4
Why Clustering?
The AOS 8 clustering feature was designed primarily for mission-critical networks. The goal is to
provide full redundancy to APs and WLAN clients, should one or more cluster member fail.
There are several benefits of deploying Aruba Mobility Controllers (MC) as managed devices in a
cluster includes:
Seamless campus roaming
Client Stateful Failover
AP and Client load balancing
5. 5
Cluster AP Load Balancing
Why Load Balance APs?
1 Easy scaling of cluster nodes
MC
Mobility Master/Standby
Headquarter
MC MC
2 Eliminates manual AP distribution via
LMS-IP
3 Configurable feature-disabled by
default
6. 6
AP Load Balancing
AP Master in a Cluster Deployment
Mobility Master/Standby
MC1 MC2 MC3 MC4
1 L2 Connection
Cluster nodes controller-ip in same vlan
2 L3 Connection
Nodes controller-ip in different vlans
ii AP Master == VRRP VIP
i Create VRRP instance among cluster nodes
i AP Master == One of the nodes controller-ip
7. 7
AP Load Balancing
AP Distribution in a Cluster Deployment
Mobility Master/Standby
MC1 MC2 MC3 MC4
1 Planned via LMS-IP
2 Automated via Load Balancing
8. 8
AP Load Balancing
AP Distribution with LMS IP
Mobility Master/Standby
MC1 MC2 MC3 MC4
1 Planned AP distribution among cluster
members (AP Load Balancing disabled)
2 AP Group -> AP System Profile -> LMS-IP
3 APs receive LMS-IP from AP Master
4 LMS-IP == AAC
5 S-AAC assigned by Cluster Leader LMS=MC4LMS=MC1LMS=MC2 LMS=MC3
9. 9
AP Load Balancing
AP Distribution with Load Balancing
Mobility Master/Standby
MC1 MC2 MC3 MC4
1 New AP to Cluster first contacts AP Master
2 AP is redirected to its Active AAC
3 Cluster Leader assigns A-AAC for all APs
4 Once AP is up on A-AAC, Standby AAC is
also selected by Cluster Leader
5 Active and Standby AAC assignment
distribution based on AP load
10. 10
AP Load Balancing
Cluster nodes AP load Mobility Master/Standby
MC1 MC2 MC3 MC4
1 Each cluster node ends up with Active and
Standby APs
2 Active AP Load % = Active AP Load / platform
capacity
3 Total AP Load % = (Active APs + Standby APs)
/ platform capacity
4 Prior to AOS 8.3:
LB algorithm uses Total Load %
5 From AOS 8.3:
LB algorithm uses Active Load %
11. 11
AP Load Balancing
Load Balancing Algorithm (Prior to AOS 8.3)
1 Cluster Leader considers Total AP load
2 Identify nodes with Max and Min total load
percentage
3
AP load balancing triggered when
Active AP Rebalance Threshold > 50%
Active AP Unbalance Threshold > 5%
4 Standby AP load redistributed first
Mobility Master/Standby
MC1 MC2 MC3 MC4
5 Active AP load redistributed after until total
load is balanced
12. 12
AP Load Balancing
Load Balancing Algorithm (AOS 8.3)
1 Cluster Leader considers Active AP load
2 Identify nodes with max and min active
load percentage
3 Identify nodes with max and min total load
percentage
4
AP load balancing triggered when
Active AP Rebalance Threshold > 50%
Active AP Unbalance Threshold > 5%
Mobility Master/Standby
MC1 MC2 MC3 MC4
5 Active AP redistribution initiated to re-
establish AP load balance
13. 13
AP Load Balancing
Load Balancing Algorithm (AOS 8.3)
6 If Active APs cannot be moved, Standby
AP move is initiated
7 Standby AP move from max total load
member to min total load member
8 Periodic load rebalancing frequency is
1 min
9 AP Rebalance count = 30
Mobility Master/Standby
MC1 MC2 MC3 MC4
16. 16
Cluster Client Load Balancing
Why Load Balance Clients?
1 Hashing algorithm ultimately leads to
uneven client distribution
2 Not efficient use of system resources
MC MC
Mobility Master/Standby
Headquarter
MC
17. 17
Cluster Client Load Balancing
Why Load Balance Clients?
1 Hashing algorithm ultimately leads to
uneven client distribution
2 Not efficient use of system resources
MC MC
Mobility Master/Standby
Headquarter
3 Load Balance clients to optimally load
users across a cluster
MC
18. 18
Cluster Client Load Balancing
How does Load on a controller calculated?
1 Identify the controller model
2 Get current client count on controllers
3 Get total client capacity for controller
7240 7220 7210
3000 32000 2000 24000 1000 16000
4 Ratio of the two will give the load
÷ ÷ ÷
9% 8.3% 6.2%
5 Based on the load and additional
triggers load balancing takes place
19. 19
Cluster Client Load Balancing
Load Balancing Triggers
1 Active Client Rebalance Threshold (50%)
Active load on any cluster member
3
Unbalance Threshold (5%)
Load difference between max loaded cluster node and min
loaded cluster node
2 Standby Client Rebalance Threshold (75%)
Standby load on any cluster member
20. 20
Cluster Client Load Balancing
Active Clients Load Balancing
1 Active Clients Simultaneous Triggers
i Active Clients Rebalance Threshold (50%)
ii Unbalance Threshold (5%)
22. 22
Cluster Client Load Balancing
Load Balance triggering example
1 Identify the controller model
2 Get current client count on controllers
3 Get total client capacity for controller
7240 7220 7210
17000 32000 10000 24000 1000 16000
4 Ratio of the two will give the load
÷ ÷ ÷
53% 41.6% 6.2%
5 LB triggered -> Rebalance from 7240
towards 7210
Clients
23. 23
AP termination in cluster
New AP finds the master through the usual master discovery process.
An MC sends the AP its name and AP group as well as the LMS IP.
LMS ip could be an MC or a cluster of MCs. The AP then attempts to contact the LMS ip.
If a node list is returned by the MC, then the AP is part of a cluster.
The LMS parameter is thus ignored, since the node list now takes priority.
24. 24
AP termination in cluster
Once in communication with an MC in the cluster, the AP may terminate on the MC or be
redirected to it’s A-AAC.
AP sends hello packet to the A-AAC and receive its full configuration.
If there is no reply from any of the MC in the node list, AP tries the LMS ip.
If the LMS ip doesn't respond, the AP tries the backup LMS ip.
25. 25
The apmove command
This command allows you to manually reassign an AP or AP group to any managed device.
This is useful when we want to move some specific APs to another managed device.
This command has to be executed on the cluster leader.
26. 26
Cluster CoA (Change of Authorization) support
The UAC (User Anchor Controller) role facilitates end user redundancy. The UAC handles all
wireless client traffic-association/disassociation, authentication, and all unicast traffic between
itself and the client. Regardless of where the clients roam, their UAC remains the same.
If the A-UAC fail, the user seamlessly connects to the S-UAC, which of course has a different IP.
The authorization module authenticates clients on the A-UAC and sets the A-UAC ip as the NAS-
IP.
Radius servers set the NAS-IP as the A-UAC in the client database. The same ip is later used to
change the client state or attributes.
The challenge is when the client moves to a new UAC, the authentication server is not updated.
This means that the authorization transactions will fail.
To overcome this scenario, we should configure each cluster member to use VRRP. This enables
interaction between the cluster and the authorization server. We refer this as Authorization Server
Interaction (ASI).
27. 27
Cluster CoA Support
How is CoA supported in a Cluster
1 Multiple VRRP instances
Simplified Cluster Upgrade
N
2 Reserved vrrp-id’s: 220 - 255
3 N VIPs for N nodes
Mobility Master/Standby
4 VIP as the NAS-IP in radius requests
28. 28
Cluster CoA Support
VRRP instances Mobility Master/Standby
1 3 nodes <=> 3 VRRP instances <=> 3 VIPs
2 VIPs: VIP1 VIP2 VIP3
VRRP- IDs: 220 221 222
VRRP priorities:
MC1 MC2 MC3
Id 220 255 215 235
Id 221 235 255 215
Id 222 215 235 255
29. 29
Cluster CoA Support
UAC Change due to Controller Failure
STEP 1: User authenticates against CPPM
UAC S-UAC
VIP1
MC3MC1 MC2
Master Backup1 Backup2
VRRP1
Client
STEP 2: MC1 fails and Client UAC is now MC3
30. 30
Cluster CoA Support
UAC Change due to Controller Failure
STEP 1: User authenticates against CPPM
UAC S-UAC
VIP1
MC3MC1 MC2
Master Backup1 Backup2
VRRP1
Client
STEP 2: MC1 fails and Client UAC is now MC3
31. 31
Cluster CoA Support
UAC Change due to Controller Failure
STEP 1: User authenticates against CPPM
UAC S-UAC
VIP1
MC3MC1 MC2
Backup2
VRRP1
Client
STEP 2: Client UAC changes: MC1 -> MC3
Master
32. 32
Cluster CoA Support
UAC Change due to Controller Failure
STEP 1: User authenticates against CPPM
VIP1
MC3MC1 MC2
Backup2
VRRP1
Client
STEP 2: Client UAC changes: MC1 -> MC3
Master
UAC
33. 33
Cluster CoA Support
UAC Change due to Controller Failure
STEP 1: User authenticates against CPPM
VIP1
MC3MC1 MC2
Backup2
VRRP1
Client
STEP 2: Client UAC changes: MC1 -> MC3
Master
UAC
STEP 3: Radius sends CoA Message to VIP1
34. 34
Cluster CoA Support
UAC Change due to Controller Failure
STEP 1: User authenticates against CPPM
VIP1
MC3MC1 MC2
Backup2
VRRP1
Client
STEP 2: Client UAC changes: MC1 -> MC3
Master
UAC
STEP 3: Radius sends CoA Message to VIP1
STEP 4: MC2 forwards CoA to all cluster nodes
CoA
35. 35
Cluster CoA Support
UAC Change due to Controller Failure
STEP 1: User authenticates against CPPM
VIP1
MC3MC1 MC2
Backup2
VRRP1
Client
STEP 2: Client UAC changes: MC1 -> MC3
Master
UAC
STEP 3: Radius sends CoA Message to VIP1
STEP 4: MC2 forwards CoA to all cluster nodes
CoA
STEP 5: MC3 returns CoA-ACK to Radius
48. 48
Troubleshooting Commands
Tech Support Logs
1 Cluster Tech Support at the MC
2 Cluster Tech Support at the AP
show cluster-tech-support
show ap cluster-tech-support ap-name <AP_NAME>
CoA is supported within a cluster using the following mechanism:
1. Multiple VRRP instances are dynamically created: one instance per cluster node, where such node is made master of that instance and the other nodes are made as backups.
2. The VRRP IDs of those instances start from 220 up to 255 and are reserved by the system.
3. The VIP of each VRRP instance is inserted by the Master of that instance as the NAS-IP when sending radius requests to the radius server.
The current slide shows an example of a 3-nodes cluster where 3 VRRP instance will be dynamically created with 3 VIPs.
The VRRP IDs will be 220, 221 and 222.
The priorities are dynamically assigned where the master of an instance (MC1) with ID 220 gets 255, the first backup within that instance gets 255-20=235 and the second backup gets 235-20=215
The same is true for the second vrrp instance (MC2) with ID-221 where MC2 gets the highest priority 255 followed by MC1 as backup1 with 235 and MC3 as backup2 with 215.
The third instance follows the same pattern.
In this step, we consider the case of MC1 failing while the client was still in session.
In this step, we consider the case of MC1 failing while the client was still in session.
When MC1 fails ,several events take place:
* the AP fails over to MC2
* the client state is moved to the S-UAC for the client that becomes now the UAC for that client.
MC2 becomes the VIP1 owner.
A CoA Request comes from CPPM with a destination VIP1.
As the VRRP master and owner of VIP1, MC2 picks up the CoA-Request and unicast it to all nodes in the cluster (in our case with MC1 down, only MC3 (UAC for that client) is left.
MC3 sends a CoA-ACK back to CPPM after it successfully completes the change in the CoA request.
The slide shows the CLI commands used to create the cluster.
The CLI output of the VRRP instances on two cluster members showing
* VRRP 149 created by the administrator with its VIP used as the AP master.
* the 3 VRRP instances illustrating what we described earlier: 220, 221 and 222, along with their dynamically assigned priorities.