Can a webserver determine if its the active node of an HA failover system without hard coding anything on the server itself? - language-agnostic

I can think of a few hacks using ping, the box name, and the HA shared name but I think that they are leading to data leakage.
Should a box even know its part of an HA cluster or what that cluster name is? Is this more a function of DNS? Is there some API exposed for boxes to join an HA cluster and request the id of the currently active node?
I want to differentiate between the inactive node and active node in alerting mechanisms for a running program. If the active node is alerting I want to hit a pager and on the inactive node I want to send an email. Pushing the determination into the alerting layer moves the same problem elsewhere.
EASY SOLUTION: Polling the server from an external agent that connects through the network makes any shell game of who is the active node a moot point. To clarify this the only thing that will page is the remote agent monitoring the real. Each box can send emails all day long for all I care.

It really depends on the HA system you're using.
For example, if your system uses a shared IP and the traffic is managed by some hardware box, then it can be hard to determine if a certain box is a master or slave. That will depend on a specific solution really... As long as you can add a custom script to the supervisor, you should be ok - for example the controller can ping a daemon on the master server every second. In the alerting script, simply check if the time of the last ping < 2 sec...
If your system doesn't have a supervisor / controller node, but each node tries to determine the state itself, you can have more problems. If a split brain occurs, you can end up with both slaves or both masters, so your alerting software will be wrong in both cases. Gadgets that can ensure only one live node (STONITH and others) could help.
On the other hand, in the second scenario, if the HA software works on both hosts properly, you should be able to obtain the master/slave information straight from it. It has to know its own state at any time, because it's one of its main functions. In most HA solutions you should be able to either get the current state, or add some code to run when the state changes. Heartbeat offers both.
I wouldn't worry about the edge cases like a split brain though. Almost any situation when you lose connection between the clustered nodes will be more important than the stuff that happens on the separate nodes :)
If the thing you care about is really logging / alerting only, then ideally you could have a separate logger box which gets all the information about the current network / cluster status. External box will probably have better idea how to deal with the situation. If your cluster gets dos'ed / disconnected from the network / loses power, you won't get any alert. A redundant pair of independent monitors can save you from that.
I'm not sure why you mentioned DNS - due to its refresh time it shouldn't be a source of any "real-time" cluster information.

One way is to get the box to export it's idea of whether it is active into your monitoring. From there you can predicate paging/emailing on this status (with a race condition around failover), and alert on none/too many systems believing they are active.
Another option is to monitor the active system via a DNS alias (or some other method to address the active system) and page on that. Then also monitor all the systems, both active and inactive, and email on that. This will cause duplicate alerts for the active system, but that's probably okay.
It's hard to be more specific without knowing more about your setup.

As a rule, the machines in a HA cluster shouldn't really know which one is active. There's one exception, mind, and that's with cronjobs. At work, we have a HA cluster on top of which some rather important services run. Some of those use services have cronjobs, and we only want them running on the active box. To do that, we use this shell script:
#!/bin/sh
HA_CLUSTER_IP=0.0.0.0
if ip addr | grep $HA_CLUSTER_IP >/dev/null; then
eval "$#"
fi
(Note that this is running on Debian.) What this does is check to see if the current box is the active one within the cluster (replace 0.0.0.0 with the external IP of your HA cluster), and if so, executes the command passed in as arguments to the script. This ensures that one and only one box is ever actually executing the cronjobs.
Other than that, there's really no reasons I can think of why you'd need to know which box is the active one.
UPDATE: Our HA cluster uses Heartbeat to assign the cluster's external IP address as a secondary address to the active machine in the cluster. Programmatically, you can check to see if your machine is the current active box by calling gethostbyname(), and iterating over the data returned until you either get to the end or you find the cluster's IP in the list.

Without hard-coding.... ? I assume you mean some native heartbeat query, not sure. However, you could use ifconfig, HA creates a virtual interface on whatever interface it is configured to run on. For instance if HA was configured on eth0 then it would create a virtual interface of eth0:0, but only on the active node.
Therefore you could do a simple query of the ifconfig output to determine if the server twas the active node or not, for example if eth0 was the configured interface:
ACTIVE_NODE=`ifconfig | grep -c 'eth0:0'`
That will set the $ACTIVE_NODE variable to 1 (for active) and 0 (if standby). Hope that may help.
http://www.of-networks.co.uk

Related

Is there a way to keep track of the calls being done in mysql server by a web app?

I'm finishing a system at work that makes calls to mysql server. Those calls' arguments reveal information that I need to keep private, like vote(idUser, idCandidate). There's no information in the db that relates those two of course, nor in "the visible part" of the back end, but even though I think this can't be done, I wanted to make sure that it is impossible to trace this sort of calls, with a log or something (calls that were made, or calls being made at the moment), as it is impossible in most languages, unless you specifically "debug" in a certain way, while the system is in production and being used. I hope the questions is clear enough. Thanks.
How do I log thee? Let me count the ways.
MySQL query log. I can enable this per-session and send everything to a log file.
I can set up a slave server and have insertions sent to me by the master. This is a significant intervention and would leave a wide trace.
On the server, unbeknownst to either Web app and MySQL log, I can intercept communications between the two. I need administrative access to the machine, of course.
On the server, again with administrative access, I can both log the query calls and inject a logging instrumentation into the SQL interface (the legitimate one is the MySQL Audit Plugin, but there are several alternatives, developed for various purposes by developers over the years)
What can you do? You can have the applications use a secure protocol, just for starters.
Then, you need to secure your machine so that administrator tricks do not work, and even if the logs are activated, nobody can read them and you can be advised of any new and modified file to delete it promptly.

Openshift HAproxy sticky session issue

I have a deployment with 2 pods of a web application. The web application requires a logon and maintains a session.
After i kill the first pod i am automatically redirected to the logon page of the second pod, but when the first pod is loaded again i am redirected back to it.
I have tried to use the HAproxy "balance source" algorithm and coockies.
Any idea why doesn't it stay with the second pod?
balance source uses a hashing algorithm that changes the workload distribution every time the number of available backends changes, because that is what it's designed to do. If you had more than 2 backends, you would also find that taking down any one backend will cause some traffic that wasn't even hitting the impacted backend to shift to another, because of this redistribution.
If the hash result changes due to the number of running servers changing, many clients will be directed to a different server.
http://cbonte.github.io/haproxy-dconv/1.6/configuration.html#4-balance
For an explanation of why you didn't see the expected behavior when using cookies instead of balance source, we'd need to see your configuration.

Connections Option in RDS Mysql and best way to handle many connections

In the below image it shows current activity as 99 Connections.
How exactly it is counted.
RDS is accessed through node.js webservices, php website. Every time I do some operations I close the connection. So once after closing it doesn't decrease rather it keeps increasing. Later I got the too many connections error message once the connections became 608. I restarted then it works. I never seen it decreasing.
So what is the best way I can handle it.
Below is the image which is showing when I run SHOW FULL PROCESSLIST;
PHP-based web pages that use a MySQL connection generally exit as soon as they're done rendering page content, so the connection gets closed whether you explicitly call a mysqli or PDO close method or not.
The same is not true of Node services, which run for a long time and can therefore easily leak resources. It's probable that you're opening connections, but not closing them, in your Node service, which would produce the sort of behavior you're seeing here. (This is an easy mistake to make, especially for those of us whose background is largely in more ephemeral PHP scripts.)
One good way to identify the problem is to connect to the MySQL instance via Workbench or the console monitor and issue SHOW FULL PROCESSLIST; to get a list of currently active connections, their originating hosts, and the queries (if any) they are executing. This may help you narrow down the source of the leaking connections, so that you can identify the code at fault and repair it.

How do I use the Zabbix 2.2 JSON API to determine host status?

I'm having to modify a Zabbix traffic light web page that shows the general availability or status of some hosts.
The update is because I'm ugrading to version 2.2 from 1.8. The status field is no longer used.
According to what I've been reading on the web and off the zabbix website the general way to determine availability is now to use agent.ping and an agent.ping.nodata trigger.
How do I implement that in practice?
https://www.zabbix.com/documentation/2.2/manual/api/reference/trigger/get
It's been a while since you asked this question, nevertheless one might find my reply useful I hope :)
You could consider examining the host object, where the status of the interface is being reflected (either Zabbix Agent, SNMP, IPMI, JMX).
https://www.zabbix.com/documentation/2.2/manual/api/reference/host/object
This has downsides, however. The specific interface could be reported "down" due to many reasons (credentials changed, firewall was changed, daemon died etc.). That's why I chose this approach:
having one item which pings regularly
having one item which pulls data regularly (in my case Zabbix Agent or SNMP)
having one trigger for "ping fails", another for "Zabbix Agent fails" (or SNMP fails). That's where you'd use nodata(). Assign it a medium severity.
having one more trigger which checks for ping and zabbix agent failure (with a high severity) - that's my dead host detection. It gets escalated 7x24.
optional: define dependencies on the triggers to that you only get one event (host dead) if both ping & snmp/Zabbix fail
put all this into one template and assign it the respective hosts
Now you can rely on the "host dead" trigger (it's always available no matter if you do ping & snmp/zabbix/jmx/whatever) - which is much more relevant than the default "interface works" status field from the host object.

How do I allow any host to submit jobs in Oracle/Sun Grid Engine?

We have an internal network devoted to development and testing, and this network has an OGE cluster on it. I'd like to allow any machine on that network to submit jobs, without having to add them manually one by one as submit hosts. I've tried doing a wildcard, but it hasn't liked my syntax. Is there any way to do this?
Thanks!
A qualified "no" - if you really need this, consider automating it instead.
GridEngine does not support wildcarding in host names. GE relies heavily on forward and reverse name resolution for pretty much all host interactions. You are not going to get a GridEngine cluster to blindly accept job submissions from any unspecified machine on your subnet without some bad magic.
If you use a configuration management system like Puppet or Chef, that might be the best layer to define whether a server is a submit host or not.
The alternative, brute force way(this will almost certainly violate your IT department's acceptable use policy) is using something like nmap to produce a list of hostnames on your network(if you think you can get away with it) and write a simple shell script to add each one as a submit host. This approach would require minor ongoing maintenance as the hosts on your network change over time, etc.