Fault tolerant and scalable ibm mq

Integration Technical Conference 2019
Build a truly fault tolerant and
scalable IBM MQ solution
David Ware
Chief Architect, IBM MQ

© 2019 IBM Corporation
Fault Toleration
2
Queue Manager
Single
Queue Manager Queue Manager Queue Manager Queue Manager
Multiple
100%
0%
100%
0%
availabilityavailability
More is better for availability

Queue Manager Queue Manager Queue Manager
It’s not just the queue managers…
3
Client Client Client
Step 1
Horizontally scale the application into multiple instances, all
performing the same role
A queue manager works better when there are multiple
applications working in parallel Queue Manager
Step 2
Horizontally scale the queue managers
Create multiple queue managers with the ‘same’ configuration
Distribute the application instances across the queue managers
Client Client Client Client Client Client

Scaling
4
Single Multiple
Queue Manager
Queue Manager Queue Manager Queue Manager Queue Manager Queue Manager
x1x4 x1x2x3x4x5x6x7x8xn
More is better for scalability

5
Let’s go through that
availability thing one step at
a time…

Node
Single, non-HA queue manager
6
App App App
Queue Manager
100%
0%
100%
0%
System
availability
Message
availability
While the QMgr is down
all of the applications
need to wait for it to be
restarted
All queued messages
require the QMgr to restart
How much messaging
work can proceed
Proportion of queued
message that are available

Node B Node CNode A
Single HA queue manager
7
App App App
Queue Manager Queue ManagerQueue Manager
Highly available
queue manager
and queue
instances
100%
0%
100%
0%
HA queue managers are
restarted quickly (typically
a ‘few’ seconds)
System
availability
Message
availability
Here we have one active instance of the queue
manager with two replica, standby, instances ready
to take over.
(This happens to be the MQ RDQM HA model, other
solutions like multi-instance queue managers are
subtly different (only one standby) but essentially
the same)

Node B Node CNode A
Multiple HA queue managers
8
Queue Manager 1
App App App AppAppApp App
Queue Manager 1Queue Manager 1
Highly available
queue manager
and queue
instances
1/3 of message
traffic
Queue Manager 2 Queue Manager 2Queue Manager 2
Highly available
queue manager
and queue
instances
1/3 of message
traffic
Queue Manager 3Queue Manager 3Queue Manager 3
Highly available
queue manager
and queue
instances
1/3 of message
traffic
AppApp
To further increase the
availability you need to
remove the single point of
failure that is a queue
manager.
For this, create multiple
queue managers and stripe
the messaging workload
across them by defining the
“same” queue on all of
them.
Each message is only
queued on a single queue
manager but the
multiple queue managers
mean any one outage is
confined to a subset of the
workload

Node B Node CNode A
Multiple HA queue managers – ordered consumption
9
Queue Manager 1
Highly available
queue manager
and queue
instances
1/3 of message
traffic
Highly available
queue manager
and queue
instances
1/3 of message
traffic
Highly available
queue manager
and queue
instances
1/3 of message
traffic
AppApp
Connect: QMgr1 Connect: QMgr2 Connect: QMgr3
100%
0%
100%
0%
Messages that require
ordering need to go to the
same queue manager.
One approach to achieve
this is for application
instances to connect to a
specific queue manager
based on their ordered
stream of messages.
This ensures messages are
processed in order per-
group
While a QMgr is
down a subset of
application need to
wait for it to restart
System
availability
Message
availability
Queued messages are
confined to a single
queue manager.
Therefore, the QMgr
needs to restart to
make those messages
available. Hence the
need to make each
queue manager highly
available

Node B Node CNode A
Multiple HA queue managers – unordered consumption
10
Queue Manager 1
Highly available
queue manager
and queue
instances
1/3 of message
traffic
Highly available
queue manager
and queue
instances
1/3 of message
traffic
Highly available
queue manager
and queue
instances
1/3 of message
traffic
AppApp
Connect: QMgrGroup
100%
0%
100%
0%
Application instances
can connect to any
queue manager as the
order that they are
queued across the
multiple instances is not
a concern.
Connecting across a
group can be achieved
with a CCDT queue
manager groupThere is always a
running QMgr for an
application to connect
to for new work
Queued messages are
still confined to a single
queue manager.
Therefore, the QMgr
still needs to restart to
make those particular
messages available.
System
availability
Message
availability

Single vs. Multiple
11
Single
o Simple
o Invisible to applications
o Limited by maximum system size
o Liable to hit internal limits
o Not all aspects scale linearly
o Restart times can grow
o Every outage is high impact
Multiple
o Unlimited by system size
o All aspects scale linearly
o More suited to cloud scaling
o Reduced restart times
o Enables rolling upgrades
o Tolerate partial failures
o Visible to applications – limitations apply
o Potentially more complicated
Queue Manager

uniform cluster…
12
Try to stop thinking about each individual queue manager and start thinking
about them as a cluster

13
The fundamentals on MQ Clusters
(skip this if you know it)

MQ clustering
What MQ Clusters provide :
14
Availability routing
Horizontal scaling of queues
Configuration directory
Dynamic registration and lookup
Dynamic channel management
Dynamic message routing
Foundation

cluster
FR
directoryQMGRFR
QMGR
QMGR
QMGR
QMGR
QMGR
Cluster management
15
Full repositories learn
everything about the
cluster. These are the
“directory servers” The other queue
managers only learn
about what they need.
These are ”partial
repositories”

cluster
FR
directoryQMGRFR
QMGR
QMGR
QMGR
QMGR
QMGR
App
?
Queue managers persistently
cache their knowledge of the
cluster resources, limiting
interactions with the full
repositories
Cluster management
16
Full repositories will pass on the details
of cluster queues and the connection
details of the queue manager they are
located on
Details of a clustered queue or
topic are sent to the full
repositories in the cluster

cluster
FR
directoryQMGRFR
QMGR
QMGR
QMGR
QMGR
QMGR
App
Unused cluster information
times out and is removed
from the local cache
Cluster management
17

cluster
FR
directoryQMGRFR
QMGR
QMGR
QMGR
QMGR
QMGR
If a queue manager is simply switched
off, rather than exiting the cluster in a
controlled manner, it will eventually be
cleaned from every queue manager’s
cache
Cluster management
18

19
How to set up a cluster
(skip this too if you know it)

FR1
Step 1: Create your two full repositories
ALTER QMGR REPOS(‘CLUS1’)
DEFINE CHANNEL(‘CLUS1.FR1’) CHLTYPE(CLUSRCVR)
CLUSTER(‘CLUS1’) CONNAME(FR1 location)
DEFINE CHANNEL(‘CLUS1.FR2’) CHLTYPE(CLUSSDR)
FR2
ALTER QMGR REPOS(‘CLUS1’)
DEFINE CHANNEL(‘CLUS1.FR2’) CHLTYPE(CLUSRCVR)
DEFINE CHANNEL(‘CLUS1.FR1’) CHLTYPE(CLUSSDR)
FR2
FR1
20

FR1
FR2
QMGR1
DEFINE CHANNEL(‘CLUS1.QMGR1’)
CHLTYPE(CLUSRCVR)
CLUSTER(‘CLUS1’)
CONNAME(QMGR1 location)
DEFINE CHANNEL(‘CLUS1.FR1’)
CHLTYPE(CLUSSDR)
CONNAME(FR1 location)
DEFINE QLOCAL(Q1)
CLUSTER(CLUS1)
Q1
FR2
QMGR1
FR1
QMGR1
Q1@QMGR1 FR1
FR2
Step 2: Add in more queue managers
DISPLAY CLUSQMGR(*)
DISPLAY QCLUSTER(*)Q1@QMGR1
21

FR1
FR2
QMGR1
DEFINE CHANNEL(‘CLUS1.QMGR1’)
CHLTYPE(CLUSRCVR)
CHLTYPE(CLUSSDR)
DEFINE QLOCAL(Q1)
CLUSTER(CLUS1)
Q1
QMGR2 DEFINE CHANNEL(‘CLUS1.QMGR2’)
CHLTYPE(CLUSRCVR)
CHLTYPE(CLUSSDR)
FR2
QMGR1
QMGR2
FR1
QMGR1
QMGR2
Q1@QMGR1
Q1@QMGR1 FR1
FR2
FR1
FR2
Step 2: Add in more queue managers
22

FR1
FR2
QMGR1
Q1
QMGR2
FR2
QMGR1
QMGR2
FR1
QMGR1
QMGR2
Q1@QMGR1
Q1@QMGR1
Client 1
MQOPEN(Q1)
Q1?
FR1
FR2
QMGR1
FR1
FR2
Q1?
Q1@QMGR1
Q1@QMGR1
Step 3: Start sending messages
23

FR1
FR2
QMGR1
QMGR2
So all you needed…
§ Two full repository queue
managers
§ A cluster receiver
channel each
§ A single cluster sender
each
§ No need to manage pairs
of channels between
each queue manager
combination or their
transmission queues
§ No need for remote
queue definitions
24

Horizontal scaling with
MQ Clustering

Horizontal scaling with MQ Clustering
26
Queue Manager
Earlier we showed how to scale applications
directly across multiple queue managers, with an
MQ Cluster you can do that with queue manager-to-
queue manager message traffic.
A queue manager will typically route messages
based on the name of the target queue
In an MQ Cluster it is possible for multiple queue
managers to independently define the same named
queue
Any queue manager that needs to route messages
to that queue now has a choice…
?

Channel workload balancing
27
App 1App 1Client
Queue Manager
• Cluster workload balancing applies when there are
multiple cluster queues of the same name
• Cluster workload balancing will be applied in one of three ways:
• When the putting application opens the queue - bind on open
• When a message group is started - bind on group
• When a message is put to the queue - bind not fixed
• When workload balancing is applied:
• The source queue manager builds a list of
all potential targets based on the queue name
• Eliminates the impossible options
• Prioritises the remainder
• If more than one come out equal, workload balancing ensues …
• Balancing is based on:
• The channel – not the target queue
• Channel traffic to all queues is taken into account
• Weightings can be applied to the channel
• … this is used to send the messages to the chosen target

28
Client
Queue Manager
Client

29
App 1App 1Client
Queue Manager

30
App 1App 1Client
Queue Manager
Tip: By default, a matching queue on the same
queue manager that the application is connected
to will be prioritized over all others for speed.
To overcome that, look at CLWLUSEQ

Horizontal scaling – do I really need MQ Clustering?
31
App 1App 1Client
Service
Scaled out applications
App 1App 1Client
Service
App 1App 1Client
Service
Q. Is clustering required?
A. Maybe, maybe not …
Service
Single producing application
Client
ServiceService
A. Definitely
A. Definitely
ServiceService
¸
App 1App 1Client
Gateway routing
Service
Queue Manager

Back to the uniform cluster

Building scalable, fault tolerant, solutions
Many of you have built your own continuously
available and horizontally scalable solutions over
the years
Let’s call this the “uniform cluster” pattern
MQ has provided you many of the building blocks -
Client auto-reconnect
CCDT queue manager groups
But you’re left to solve some of the problems,
particularly with long running applications -
Efficiently distributing your applications
Ensuring all messages are processed
Maintaining availability during maintenance
Handling growth and contraction of scale
App App App
decoupled
AppApp
33

Uniform Cluster
34
MQ 9.1.2 started to make that easier
For the distributed platforms, declare a set of
matching queue managers to be following the
uniform cluster pattern
All members of an MQ Cluster
Matching queues are defined on every queue manager
Applications can connect as clients to every queue
manager
MQ will automatically share application connectivity
knowledge between queue managers
The group will use this knowledge to automatically
keep matching application instances balanced
across the queue managers
Matching applications are based on application name
(new abilities to programmatically define this)
MQ 9.1.2 is started to roll out the client support for
this
IBM MQ 9.1.2 CD
Application awareness
https://developer.ibm.com/messaging/2019/03/21/building-scalable-fault-tolerant-ibm-mq-systems/

Application awareness
35
App App
Automatic Application balancing
Application instances can initially connect to any member of
the group
We recommend you use a queue manager group and
CCDT to remove any SPoF
Every member of the uniform cluster will detect an
imbalance and request other queue managers to donate
their applications
Hosting queue managers will instigate a client auto-
reconnect with instructions of where to reconnect to
Applications that have enabled auto-reconnect will
automatically move their connection to the indicated queue
manager
Client support has been increased over subsequent CD
releases. 9.1.2 CD started with support for C-based
applications, 9.1.3 CD added JMS …
App App App App
IBM MQ 9.1.2 CD

© 2019 IBM Corporation36
App App
Automatically handle rebalancing following planned and
unplanned queue manager outages
Existing client auto-reconnect and CCDT queue
manager groups will enable initial re-connection on
failure
Uniform Cluster rebalancing will enable automatic
rebalancing on recovery
App App App App
IBM MQ 9.1.2 CD

37
App App App App App App
IBM MQ 9.1.2 CD
Even to horizontally scale out a queue
manager deployment
Simply add a new queue manager
to the uniform cluster
The new queue manager will
detect an imbalance of
applications and request its fair
share

Uniform Cluster features
38
IBM MQ 9.1.2 CD
As well as the automatic rebalancing of the C library based clients, MQ 9.1.2 CD introduced a number of
new or improved features for the distributed platforms that tie together to make all this possible
This means you need both the queue managers and the clients to be the latest MQ version
Creation of a Uniform Cluster
• A simple qm.ini tuning parameter for now
The ability to identify applications by name, to define grouping
of related applications for balancing
• Extends the existing JMS capability to all languages
Auto reconnectable applications
• Only applications that connect with the auto-reconnect option
are eligible for rebalancing
Text based CCDTs to make it easier to configure this behaviour
• And to allow duplicate channel names
TuningParameters:
UniformClusterName=CLUSTER1
$ export MQAPPLNAME=MY.SAMPLE.APP
https://developer.ibm.com/messaging/2019/03/21/walkthrough-auto-application-rebalancing-using-the-uniform-cluster-pattern/
...
Channels:
DefRecon=YES

© 2019 IBM Corporation39
Balancing by application name
Automatic application balancing is based on the
application name alone
Different groups of application instances with different
application names are balanced independently
By default the application name is the executable
name
This has been customisable with Java and JMS
applications for a while
MQ 9.1.2 CD clients have extended this to other
programming languages
For example C, .NET, XMS, …
Application name can be set either programmatically
or as an environment override
App App App App App App
IBM MQ 9.1.2 CD
https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_9.1.0/com.ibm.mq.dev.doc/q132920_.htm
App App App App
App App

Building scalable and available solutions
JSON CCDT
Build your own JSON format CCDTs
Supports multiple channels of the same name
on different queue managers to simplify the
building of uniform clusters
Available with all 9.1.2 clients
C, JMS, .NET, Node.js, Golang clients
40
IBM MQ 9.1.2 CD
01100110100101
10001010101101
10101011011011
01001011110111
01110111101111
01110111011
{
“channel”:[
{
“name”:”ABC”,
”queueManager”:”A”
},
{
“name”:”ABC”,
”queueManager”:”B”
},
]
}

Configuring the CCDT for application
balancing in a Uniform Cluster
To correctly setup a CCDT for application
rebalancing it needs to contain two entries per
queue manager:
• An entry under the name of a queue
manager group
• And entry under the queue manager’s
real name
(These previously would need to be different
channels, but with the JSON CCDT this is
unnecessary)
The application connects using the queue
manager group as the queue manager name
(prefixed with an ‘*’)
41
IBM MQ 9.1.2 CD
{
"channel":
[
{
"name": ”SVRCONN.CHANNEL",
"type": "clientConnection”
"clientConnection":
{
"connection":
[
{
"host": ”host1",
"port": 1414
}
],
"queueManager": "ANY_QM”
},
},
{
"clientConnection":
{
"connection":
[
{
"host": ”host2",
"port": 1414
}
],
"queueManager": "ANY_QM”
},
},
…
…
{
"clientConnection":
{
"connection":
[
{
"host": ”host1",
"port": 1414
}
],
"queueManager": ”QMGR1”
},
},
{
"clientConnection":
{
"connection":
[
{
"host": ”host2",
"port": 1414
}
],
"queueManager": ”QMGR2”
},
}
]
}
QMGR1QMGR2

Can I decouple any application?

Does this work for all applications?
This pattern of loosely coupled applications only works for certain applications styles.
43
- no
Good
Applications that can tolerate being moved from one
queue manager to another without realising and can
run with multiple instances
• Datagram producers and consumers
• Responders to requests, e.g. MDBs
• No message ordering
Bad
Applications that create persistent state across
multiple messaging operations, or require a single
instance to be running
• Requestors waiting for specific replies
• Dependant on message ordering
• Global transactions …

A new hope for transactions
44
Global transactions require a single resource
manager to be named when connecting. For
MQ a resource manager is a queue manager.
This prevents the use of queue manager
groups in CCDTs
However, WebSphere Liberty 18.0.0.2 and
MQ 9.1.2 CD support the use of CCDT queue
manager groups when connecting
IBM MQ 9.1.2 CD
App
ConnectionFactory
GROUP
{
“channel”:[
{
“name”:”SVRCONN.QM1”,
”queueManager”:”GROUP”
},
{
“name”:”SVRCONN.QM2”,
”queueManager”:”GROUP”
},
]
}

Availability routing in an
MQ Cluster

Clustering for availability
Is MQ Clustering a high availability solution?
46
NO YES
– Having multiple potential targets for any message can improve the availability
of the solution, always providing an option to process new messages.
– A queue manager in a cluster has the ability to route new and old messages
based on the availability of the channels, routing messages to running
queue managers.
– Clustering can be used to route messages to active consuming applications.
Not for the message data.
Each message is only
available from a single queue
manager
Clustering can form a
part of the overall high
availability of the
messaging system

Channel availability routing
47
• When performing workload balancing, the availability of the
channel to reach the target is a factor
• All things being equal, messages will be routed to those targets
with a working channel
Things that can prevent routing
• Applications targeting messages at a specific queue
manage (e.g. reply message)
• Using “cluster workload rank”
• Binding messages to a target
Routing of messages based on availability doesn’t just
happen when they’re first put, it also occurs for queued
transmission messages every time the channel is retried
So blocked messages can be re-routed, if they’re not
prevented…
ServiceService
App 1App 1Client
Service
Queue Manager

Pros and cons of binding
48
Bind context:
Duration of an open
Duration of logical group
• All messages put within the bind context will go
to same target*
• Message order can be preserved**
• Workload balancing logic is only driven at the
start of the context
• Once a target has been chosen it cannot change
• Whether it’s available or not
• Even if all the messages could be redirected
Bind on open Bind on group
Bind context:
None
• Greater availability, a message will be redirected
to an available target***
• Overhead of workload balancing logic for every
message
• Message order may be affected
Bind not fixed
Bind on open is the default
It could be set on the cluster queue (don’t forget
aliases) or in the app
* While a route is known by the source queue manager, it won’t be rebalanced, but it could be DLQd
** Other aspects may affect ordering (e.g. deadletter queueing)
*** Unless it’s fixed for another reason (e.g. specifying a target queue manager)

Application availability routing

App 1App 1Client 1
QMgr
QMgr
QMgr
• Cluster workload balancing does not take into account the
availability of receiving applications
• Or a build up of messages on a queue
Service 1
Service 1
Blissful ignorance
This queue manager is unaware of
the failure to one of the service
instances
Unserviced messages
Half the messages will quickly start
to build up on the service queue
Application based routing
50

Service 1
Service 1
QMgr
QMgr
QMgr
App 1App 1Client 1
51

App 1App 1Client 1
• MQ provides a sample monitoring service tool, amqsclm
• It regularly checks for attached consuming applications (IPPROCS)
• And automatically adjusts the cluster queue definitions to route messages
intelligently (CLWLPRTY)
• That information is automatically distributed around the cluster
Service 1
Service 1
QMgr
QMgr
QMgr
Moving messages
Any messages that slipped through
will be transferred to an active
instance of the queue
Detecting a change
When a change to the open handles
is detected the cluster workload
balancing state is modifiedSending queue managers
Newly sent messages will be sent to active
instances of the queue
52
FR

Cluster Queue Monitoring Sample
– amqsclm, is provided with MQ to ensure messages are directed towards the instances of clustered queues
that have consuming applications currently attached. This allows all messages to be processed effectively
even when a system is asymmetrical (i.e. consumers not attached everywhere).
• In addition it will move already queued messages from instances of the queue where no consumers are attached
to instances of the queue with consumers. This removes the chance of long term marooned messages when
consuming applications disconnect.
– The above allows for more versatility in the use of clustered queue topologies where applications are not
under the direct control of the queue managers. It also gives a greater degree of high availability in the
processing of messages.
– The tool provides a monitoring executable to run against each queue manager in the cluster hosting queues,
monitoring the queues and reacting accordingly.
• The tool is provided as source (amqsclm.c sample) to allow the user to understand the mechanics of the tool and
customise where needed.
53

AMQSCLM Logic
– Based on the existing MQ cluster workload balancing mechanics:
• Uses cluster priority of individual queues – all else being equal, preferring to send messages to instances of queues
with the highest cluster priority (CLWLPRTY).
• Using CLWLPRTY always allows messages to be put to a queue instance, even when no consumers are attached to any
instance.
• Changes to a queue’s cluster configuration are automatically propagated to all queue managers in the cluster that are
workload balancing messages to that queue.
– Single executable, set to run against each queue manager with one or more cluster queues to be monitored.
– The monitoring process polls the state of the queues on a defined interval:
• If no consumers are attached:
– CLWLPRTY of the queue is set to zero (if not already set).
– The cluster is queried for any active (positive cluster priority) queues.
– If they exist, any queued messages on this queue are got/put to the same queue. Cluster workload balancing will
re-route the messages to the active instance(s) of the queue in the cluster.
• If consumers are attached:
– CLWLPRTY of the queue is set to one (if not already set).
– Defining the tool as a queue manager service will ensure it is started with each queue manager
54

Putting it all together

Uniform Cluster
56
App App
Bringing it all together
• Build a matching set of queue managers, in the style of a
uniform cluster
• Make them highly available to prevent stuck messages
• Consider adding amqsclm to handle a lack of consumers
• Setup your CCDTs for decoupling applications from
individual queue managers
• Look at the 9.1.2+ application rebalancing capability and
see if it matches your needs
• Connect your applications
App App App App
amqsclm
amqsclm
amqsclm
decoupled

What about MQ Clusters in
the cloud?

MQ Clusters and Clouds
Not quite…
58
“Cloud platforms provide all an MQ cluster can do”
Clouds often provide cluster-like capability:
Directory services and routing
Network workload balancing
Great for stateless workload balancing
Can be good for balancing unrestricted clients across multiple
But where state is involved, such as reliably sending messages from one
queue manager to another without risking message loss or duplication,
such routing isn’t enough
That’s still the job of an MQ cluster…

MQ Clusters and Clouds
Yes, but think it through first…
59
“So will MQ clusters work in a cloud?”
Some things may be different in your cloud:
o A new expectation that queue managers will be created
and deleted more dynamically than before
o Queue managers will “move around” with the cloud
o Your level of control may be relaxed

Adding and removing queue managers
60
Adding queue managers
o Have a simple clustering topology
o Separate out your full repositories and manage those separately
o Automate the joining of a new queue manager
+
Removing queue managers - this is harder!
o You might have messages on it that you need, how are you going to remove those first?
o If you just switch off the queue manager, it’ll still be known by the cluster for months!
o Is that a problem?
o If routing is based on availability, messages will be routed to alternative queue managers
o Messages will sit on the cluster transmission queues while the queue manager is still known
o It makes your view of the cluster messy
o Automate the cluster removal
o From the deleting queue manager
o Stop and delete the cluster channels – give it a little time to work through to the FRs
o Then from a full repository – as it’ll never be coming back
o RESET the queue manager out of the cluster
_

Adding and removing queue managers
Adding queue managers
– Have a simple clustering topology
– Separate out your full repositories and manage those separately (there’s no need to be adding and removing them
dynamically)
– Automate the joining of a new queue manager
Removing queue managers
– That’s harder!
– You might have messages on it that you need, how are you going to remove those?
– If you just switch off the queue manager, it’ll still be known by the cluster for months!
– Is that a problem?
• If routing is based on availability, messages will be routed to alternative queue managers
– Unless there’s no other choice (e.g. replies being targeted to this queue manager, or no other queue of the same name in the cluster)
• Messages will sit on the cluster transmission queues while the queue manager is still known
– If it had been removed those messages would have failed, or be moved to a DLQ
• Having defunct queue managers still known in the cluster might make your monitoring and problem determination harder
– Automate the cluster removal
• From the deleting queue manager
– Stop and delete the cluster channels – give it a little time to work through to the FRs
• Or from a full repository – as it’ll never be coming back
– RESET the queue manager out of the cluster
61

Queue managers will move around
62
o Usually this is not actually a problem, clouds are doing a good job of hiding
that from you
o e.g. Kubernetes “services” are effectively providing DNS for a container
o Check the cloud capabilities available to you
o Probably best to use hostnames in the connection details rather than IP addresses
o Remember that often names/addresses may only be visible from within the cloud
o External addresses may differ, comma separated CONNAMEs can help here

Control over the cluster
63
Keep it simple
In a potentially self service world, where new applications need their own queue manager, overall
control of configuring everything in the cluster is not desirable. To prevent chaos don’t just pick up
your bespoke, on-prem, cluster topology and put it in the cloud – re-think it:
o Simplify the cluster topology, do you really need all those overlapping clusters?
o Make sure the cluster infrastructure (the full repositories, gateway queue managers) are still under your
control
o Script the joining/leaving a cluster and automate this with the deployment of new queue managers
o A cluster is a shared namespace, enforce naming conventions on cluster queues and topics
o Use your full repositories as a view across the cluster

Clustering good practices

Full repositories
65
o Dedicate a pair of queue managers to being the full
repositories for the cluster
o Don’t even think of them as queue managers
o A pair for redundancy obviously, but why not more?
o More than two won’t work how you hope it will…
o A queue manager will randomly pick two full
repositories to work with (manually defining your
sender channels doesn’t guarantee which two)
o If those two are unavailable it won’t go off and
find another one
FR
QMGRFR

Check your defaults
66
Cluster transmission queues DEFCLXQ
o MQ started with each queue manager having just the one
o It added the ability to automatically split that out into one per cluster channel
o This improves visibility of channel problems
o And reduces the cross-channel impacts of a blocked channel
Cluster workload rank and priority CLWLRANK/CLWLPRTY
o These are used as a tie breaker across multiple queues/channels to determine where messages are sent
o They default to zero (the lowest), so you can raise one to make it more favourable
o But you can’t lower one to make it less favourable
o Consider setting them all to be around the middle (e.g 5) to give yourself options
Default application bind setting DEFBIND
o The default is “on open”
o Great if you need affinity across messages, poor if you value message availability over that
o Consider setting this to “not fixed”

67
Problem diagnosis

Problem diagnosis
Cluster queues are not visible where you think they should be, or messages sit on a transmission
queue for no reason…
68
It’s often:
o A break in communication between queue managers
o Check the PR->FR and FR->PR channels
o A config issue
o Check the host of the resource and that the queue manager has correctly joined the cluster
What to check:
o Configuration – double check the cluster names
o Queue manager cluster knowledge – are the queue managers known to the FRs?
o Queue manager channels – check the PR->FR and FR->PR channels
o Queue manager error logs – look for messages about expiring objects
o System queues - …
Sequence of diagnosis steps:
o Check the queue manager where the cluster definitions live
o Check all the full repositories
o Check the queue manager where you’re sending the messages from

Know your system cluster queues
69
SYSTEM.CLUSTER.COMMAND.QUEUE
o Source of all work for the cluster repository process
o Messages arrive here from local configuration
commands and from FR->PR and PR->FR
o If you delete messages from here you may need a
cluster refresh
o Steady state: empty
SYSTEM.CLUSTER.REPOSITORY.QUEUE
o Persistent store for accumulated cluster knowledge
o All cluster object changes persisted here
o Based on local configuration and remote knowledge
o If you delete messages from here your only option is a
cluster refresh
o Steady state: non-empty
SYSTEM.CLUSTER.TRANSMIT.*
o Used to transfer all messages within a cluster
o Shared between user messages and cluster control
messages
o If you delete messages from here you may need a
cluster refresh, and you may have lost your own
messages!
o Steady state: Ideally empty (may contain messages
whilst channels are down)
SYSTEM.CLUSTER.HISTORY.QUEUE
o Intended for capturing diagnostics for IBM MQ Service
teams
o Only used if REFRESH CLUSTER is issued
o Entire contents of local cluster cache written prior to
processing REFRESH
o Messages expire after 180 days
o Steady state: ignore it

Thank You
David Ware
Chief Architect, IBM MQ
dware@uk.ibm.com
www.linkedin.com/in/dware1

Fault tolerant and scalable ibm mq

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Fault tolerant and scalable ibm mq

Similaire à Fault tolerant and scalable ibm mq (20)

Plus de David Ware

Plus de David Ware (6)

Dernier

Dernier (20)

Fault tolerant and scalable ibm mq