How the Cluster Sender and Receiver channels usage differs in Full Repository & Partial Repository QManagers in IBM websphere MQ?
On the Full Repository:
- the queue manager's cluster receiver channel must point to itself, this is how other queue managers in the cluster will know how to reach the FR.
- the cluster sender channel must point to another Full Repository.
On the Partial Repository:
- the queue manager's cluster receiver channel must also point to itself, this is how other queue managers in the cluster will know how to reach it.
- the cluster sender channel must point to one of the Full Repository queue managers; this is the FR the PR will rely on for cluster object resolution.
Notes:
1. Your cluster should have 2 Full Repositories; each FR sender channel should point to the other FR.
2. Your Partial Repositories should be configured to point to one of these 2 Full Repositories; a good habit is to equally assign them between the FRs.
A cluster receiver definition is how other qmgrs in the cluster will talk back to that queue manager, it acts like a template of how to talk to the qmgr.
A cluster sender defintion creates the initial channel for one queue manager in a cluster to find a full repository for that cluster. This is a manual cluster sender. It doesnt matter whether you are a full or partial repository, you need to have a manual sender pointing to another full repository.
Subsequent connections from one queue manager to another are done using 'auto' cluster senders. A cluster queue manager queries information about a destination it needs to make a connection to (e.g. it hosts a queue that is the destination for a message) from the full repository. The information retrieved is based on the cluster receiver for the destination, hence m comment that a clusrcvr is the 'template' for the connections to that queue manager.
Related
My understanding is that Zookeeper is often used to solve the problem of "keeping track of which node plays a particular role" in a distributed system (e.g. master node in a DB or in a MapReduce cluster, etc).
For simplicity, say we have a DB with one master and multiple replicas and the current master node in the DB goes down. In this scenario, one would, in principle, make one of the replica nodes a new master node. At this point my understanding is:
If we didn't have Zookeeper
The application servers may not know that we have a new master node, so they would not know where to send writes unless we have some custom logic on the app server itself to detect / correct this problem.
If we have Zookeeper
Zookeeper would somehow detect this failure, and update the value for the corresponding master key. Moreover, application servers can (optionally?) register hooks in Zookeeper, so Zookeeper can notify them of this failure, so that the app servers can update (e.g. in memory), which DB node is the new master.
My questions are:
How does Zookeper know what node to make master? Is Zookeper responsible for this choice?
How is this information propagated to nodes that need to interact with Zookeeper? E.g. If one of the Zookeeper nodes go down, how would the application servers know which Zookeeper node to hit in this scenario? Does Zookeeper manage this differently from competing solutions like e.g. etcd?
The answer to both 1. and 2. is called leader election process and briefly works in the following way:
When a process starts in a cluster managed by ZK, the cluster enters an election state. If there is a leader then there exists an established hierarcy and the existing leader is just verified. If there is no leader (say master is down), ZK forces the znodes to use sequence flags to look for a new leader. Each node talks to its peers and sends a message containing the node's identifier (sid) and the most recent transaction it executed (zxid). These messages are called votes. When a node receives a vote it can either neglect it or keep it depending the zxid. If zxid is newer it keeps the vote if older than what it has it discards it. If there is a tie in zxids then the vote with the highest sid wins! So there will come a time when all nodes will have the same vote which will define the new leader by the sid. So this is how ZK elects a new leader node!
I'm trying to automate the deployment of OpenShift Origin into AWS, because it's a dependency of another product which I also need to deploy on demand. There are various solutions for this, but they all require a Pool ID at some point in the process. What is a Pool ID? I realise it's associated with a Redhat subscription, but can I script the generation of a Pool ID? And if so, is it necessary to treat it as a secret?
You can obtain the subscriptions available pool with :
subscription-manager list --available --pool-only
If you are many subscription, you can filter the result with --matches option (filter can contain regex) :
--matches=FILTER_STRING
lists only subscriptions or products containing the
specified expression in the subscription or product
information, varying with the list requested and the
server version (case-insensitive).
In Couchbase documentation: https://developer.couchbase.com/documentation/server/current/concepts/distributed-data-management.html
There is no concept of master nodes, slave nodes, config nodes, name nodes, head nodes, etc, and all the software loaded on each node is identical
But in my logs I get the message found in post:
https://forums.couchbase.com/t/havent-heard-from-a-higher-priority-node-or-a-master-so-im-taking-over/5924
Haven't heard from a higher priority node or a master, so I'm taking over. mb_master 000 ns_1#10.200.0.10 1:07:38 AM Tue Feb 7, 2017
and
Somebody thinks we're master. Not forcing mastership takover over ourselves mb_master 000 ns_1#10.200.0.10 1:07:28 AM Tue Feb 7, 2017
I am having trouble finding what the master does, because any search about a master results in the comment of couchbase not having a master node.
The error messages seem to originate from the cluster management which should look like this (I didn't manage to find the Couchbase implementation of it). The link points to the implementation of membase which is the predecessor of Couchbase.
While all nodes are equal in Couchbase this is not the case when there is some redistribution of data. As described in detail in this document a master is chosen to manage the redistribution. The log messages you get are caused by this process.
The Master Node in the cluster manager is also known as the orchestrator.
Straight from the Couchbase Server 4.6 documentation, https://developer.couchbase.com/documentation/server/4.6/concepts/distributed-data-management.html
Although each node runs its own local Cluster Manager, there is only
one node chosen from among them, called the orchestrator, that
supervises the cluster at a given point in time. The orchestrator
maintains the authoritative copy of the cluster configuration, and
performs the necessary node management functions to avoid any
conflicts from multiple nodes interacting. If a node becomes
unresponsive for any reason, the orchestrator notifies the other nodes
in the cluster and promotes the relevant replicas to active status.
This process is called failover, and it can be done automatically or
manually. If the orchestrator fails or loses communication with the
cluster for any reason, the remaining nodes detect the failure when
they stop receiving its heartbeat, so they immediately elect a new
orchestrator. This is done immediately and is transparent to the
operations of the cluster.
I have a queue AIS.CICSUD1.BROKER.DATA accessed by different process IDs such as BO01, BO02, BO03.
Can I create the same Queue for diferent process Id's? I tried it on WebSphere MQ Explorer but it's giving me duplicate Queue error.
My queue manager is on my local machine and I need to access the queues only from my local machine.
Please let me know
Queues must be unique on a given queue manager. All of the different types of queue (QLocal, QRemote, QAlias, QModel) share the same namespace. Typically in your situation some identifier would be added to the queue name. In this case, adding the process ID seems to be the easiest approach.
I can think of a few hacks using ping, the box name, and the HA shared name but I think that they are leading to data leakage.
Should a box even know its part of an HA cluster or what that cluster name is? Is this more a function of DNS? Is there some API exposed for boxes to join an HA cluster and request the id of the currently active node?
I want to differentiate between the inactive node and active node in alerting mechanisms for a running program. If the active node is alerting I want to hit a pager and on the inactive node I want to send an email. Pushing the determination into the alerting layer moves the same problem elsewhere.
EASY SOLUTION: Polling the server from an external agent that connects through the network makes any shell game of who is the active node a moot point. To clarify this the only thing that will page is the remote agent monitoring the real. Each box can send emails all day long for all I care.
It really depends on the HA system you're using.
For example, if your system uses a shared IP and the traffic is managed by some hardware box, then it can be hard to determine if a certain box is a master or slave. That will depend on a specific solution really... As long as you can add a custom script to the supervisor, you should be ok - for example the controller can ping a daemon on the master server every second. In the alerting script, simply check if the time of the last ping < 2 sec...
If your system doesn't have a supervisor / controller node, but each node tries to determine the state itself, you can have more problems. If a split brain occurs, you can end up with both slaves or both masters, so your alerting software will be wrong in both cases. Gadgets that can ensure only one live node (STONITH and others) could help.
On the other hand, in the second scenario, if the HA software works on both hosts properly, you should be able to obtain the master/slave information straight from it. It has to know its own state at any time, because it's one of its main functions. In most HA solutions you should be able to either get the current state, or add some code to run when the state changes. Heartbeat offers both.
I wouldn't worry about the edge cases like a split brain though. Almost any situation when you lose connection between the clustered nodes will be more important than the stuff that happens on the separate nodes :)
If the thing you care about is really logging / alerting only, then ideally you could have a separate logger box which gets all the information about the current network / cluster status. External box will probably have better idea how to deal with the situation. If your cluster gets dos'ed / disconnected from the network / loses power, you won't get any alert. A redundant pair of independent monitors can save you from that.
I'm not sure why you mentioned DNS - due to its refresh time it shouldn't be a source of any "real-time" cluster information.
One way is to get the box to export it's idea of whether it is active into your monitoring. From there you can predicate paging/emailing on this status (with a race condition around failover), and alert on none/too many systems believing they are active.
Another option is to monitor the active system via a DNS alias (or some other method to address the active system) and page on that. Then also monitor all the systems, both active and inactive, and email on that. This will cause duplicate alerts for the active system, but that's probably okay.
It's hard to be more specific without knowing more about your setup.
As a rule, the machines in a HA cluster shouldn't really know which one is active. There's one exception, mind, and that's with cronjobs. At work, we have a HA cluster on top of which some rather important services run. Some of those use services have cronjobs, and we only want them running on the active box. To do that, we use this shell script:
#!/bin/sh
HA_CLUSTER_IP=0.0.0.0
if ip addr | grep $HA_CLUSTER_IP >/dev/null; then
eval "$#"
fi
(Note that this is running on Debian.) What this does is check to see if the current box is the active one within the cluster (replace 0.0.0.0 with the external IP of your HA cluster), and if so, executes the command passed in as arguments to the script. This ensures that one and only one box is ever actually executing the cronjobs.
Other than that, there's really no reasons I can think of why you'd need to know which box is the active one.
UPDATE: Our HA cluster uses Heartbeat to assign the cluster's external IP address as a secondary address to the active machine in the cluster. Programmatically, you can check to see if your machine is the current active box by calling gethostbyname(), and iterating over the data returned until you either get to the end or you find the cluster's IP in the list.
Without hard-coding.... ? I assume you mean some native heartbeat query, not sure. However, you could use ifconfig, HA creates a virtual interface on whatever interface it is configured to run on. For instance if HA was configured on eth0 then it would create a virtual interface of eth0:0, but only on the active node.
Therefore you could do a simple query of the ifconfig output to determine if the server twas the active node or not, for example if eth0 was the configured interface:
ACTIVE_NODE=`ifconfig | grep -c 'eth0:0'`
That will set the $ACTIVE_NODE variable to 1 (for active) and 0 (if standby). Hope that may help.
http://www.of-networks.co.uk