If I have multiple Hazelcast cluster members using the same IMap and I want to configure the IMap in a specific manner programmatically, do I then need to have the configuration code in all the members, or should it be enough to have the configuration code just once in one of the members?
In other words, are the MapConfigs only member specific or cluster wide?
Why I'm asking is that Hazelcast documentation http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#configuring-programmatically
says that
As dynamically added data structure configuration is propagated across
all cluster members, failures may occur due to conditions such as
timeout and network partition. The configuration propagation mechanism
internally retries adding the configuration whenever a membership
change is detected.
this gives me the impression that the configurations propagate.
Now if member A specifies a certain MapConfig for IMap "testMap", should member B see that config when it does
hzInstance.getConfig.findMapConfig("testMap") #or .getMapConfig("testMap")
In my testing B did not see the MapConfig done by A.
I also tried specifying at A mapConfig.TimeToLiveSeconds(60), and at B mapConfig.TimeToLiveSeconds(10).
It seemed that the items in the IMap that were owned by A were evicted in 60 seconds, while the items owned by B were evicted in 10 seconds. This supports the idea that each member needs to do the same configuration if I want consistent behaviour for the IMap.
Each member owns certain partitions of the IMap. A member's IMap configuration has effect only on its owned partitions.
So it is normal to see different TTL values of the entries of the same map in different members when they have different configurations.
As you said, all members have should have same IMap configuration to have a cluster-wide persistent behavior.
Otherwise, each member will apply its own configuration to its own partitions.
But if add a dynamic configuration as described here, then that configuration is propagated to all members and change their behavior as well.
In brief, if you add the configuration before creating the instance, that is local configuration. But, if you add it after creating the instance, that is dynamic configuration and propagates to all members.
Related
My understanding is that Zookeeper is often used to solve the problem of "keeping track of which node plays a particular role" in a distributed system (e.g. master node in a DB or in a MapReduce cluster, etc).
For simplicity, say we have a DB with one master and multiple replicas and the current master node in the DB goes down. In this scenario, one would, in principle, make one of the replica nodes a new master node. At this point my understanding is:
If we didn't have Zookeeper
The application servers may not know that we have a new master node, so they would not know where to send writes unless we have some custom logic on the app server itself to detect / correct this problem.
If we have Zookeeper
Zookeeper would somehow detect this failure, and update the value for the corresponding master key. Moreover, application servers can (optionally?) register hooks in Zookeeper, so Zookeeper can notify them of this failure, so that the app servers can update (e.g. in memory), which DB node is the new master.
My questions are:
How does Zookeper know what node to make master? Is Zookeper responsible for this choice?
How is this information propagated to nodes that need to interact with Zookeeper? E.g. If one of the Zookeeper nodes go down, how would the application servers know which Zookeeper node to hit in this scenario? Does Zookeeper manage this differently from competing solutions like e.g. etcd?
The answer to both 1. and 2. is called leader election process and briefly works in the following way:
When a process starts in a cluster managed by ZK, the cluster enters an election state. If there is a leader then there exists an established hierarcy and the existing leader is just verified. If there is no leader (say master is down), ZK forces the znodes to use sequence flags to look for a new leader. Each node talks to its peers and sends a message containing the node's identifier (sid) and the most recent transaction it executed (zxid). These messages are called votes. When a node receives a vote it can either neglect it or keep it depending the zxid. If zxid is newer it keeps the vote if older than what it has it discards it. If there is a tie in zxids then the vote with the highest sid wins! So there will come a time when all nodes will have the same vote which will define the new leader by the sid. So this is how ZK elects a new leader node!
The question is, how to update a constant? This sounds like a stupid question, but let's look at the background of my issue:
Background
I manage a network of servers, which includes a MySQL server, multiple HTTP servers, and a Minecraft server (a self-hosted server that gamers who have installed Minecraft can connect to and play together). All of the user-end services (HTTP servers, Minecraft server, user apps) are directly or indirectly related to the MySQL server. The MySQL database stores different data for each player account, for example, the online/offline status of players, etc.
In programming, constants are used to create a general reference to a value that will not change across a runtime. Especially, for software-internal identifiers, such as data flags, bitmasks, etc. In my case, I also use constants to store specific data, such as the MySQL server's address and other credentials. So when I want to change the server address, I only need to modify them from one point, for example, an internal constants.php of the server.
Problem
When I migrate my MySQL database to another host or change password, I have to update the details on every server. It is not possible to create a centralized data provider that provides the server address, because the MySQL server itself is the centralized data provider. That means, every time I change the value, I must update all servers. I must also maintain a very private and local list (probably has to be written down on a memo and stick it on my computer!) of all these places, because it is really hard to locate all these references. So, my question is, is there a better way of management that allows me to change the values from one place? Note that the servers are on different hosts, so it is not possible to put it in a local file, and it doesn't sound reasonable to create a centralized data provider (call it password provider) to provide access of the real centralized data provider (MySQL) either, since if I have the need to change the MySQL database details, I have the same need to change the password provider details as well.
This is less a concern. but since it is a similar question, I am putting it down here too. I use integer bitmasks to store player ranks. For example, if the player is a VIP, he has a 0x01 flag, and if the player is a moderator, he has a 0x10 flag, and 0x11 if both VIP and moderator, vice versa. I want to refactor the bitmask values as well, but it would be great trouble, because I need to shut down all servers and update the MySQL values, update constants on every server, then restart all servers, to avoid potential security vulnerability in the period of updating. Is there a more convenient way to do that?
This is a network management question too, but I consider it more programming-related.
We are talking about deployment system. For example we can use
capistrano: https://github.com/capistrano/capistrano. We need to
save constants.php to git and create task in capistrano for deploy
this file to each server. I use this tool for deploying of projects which are one of the 50 busiest sites of the Russian segment of the Internet:)
We are talking about data migration. So there are several ways. Some of them with downtimes and some not (sometimes it depends on the situation).
Data migration without downtime:
modify your app so it will understand old variant of players bitmask
and new one
deploy modified app
update bitmask into your databases
modify your app so it will understand only new variant of bitmasks
deploy modified app
I'm having to modify a Zabbix traffic light web page that shows the general availability or status of some hosts.
The update is because I'm ugrading to version 2.2 from 1.8. The status field is no longer used.
According to what I've been reading on the web and off the zabbix website the general way to determine availability is now to use agent.ping and an agent.ping.nodata trigger.
How do I implement that in practice?
https://www.zabbix.com/documentation/2.2/manual/api/reference/trigger/get
It's been a while since you asked this question, nevertheless one might find my reply useful I hope :)
You could consider examining the host object, where the status of the interface is being reflected (either Zabbix Agent, SNMP, IPMI, JMX).
https://www.zabbix.com/documentation/2.2/manual/api/reference/host/object
This has downsides, however. The specific interface could be reported "down" due to many reasons (credentials changed, firewall was changed, daemon died etc.). That's why I chose this approach:
having one item which pings regularly
having one item which pulls data regularly (in my case Zabbix Agent or SNMP)
having one trigger for "ping fails", another for "Zabbix Agent fails" (or SNMP fails). That's where you'd use nodata(). Assign it a medium severity.
having one more trigger which checks for ping and zabbix agent failure (with a high severity) - that's my dead host detection. It gets escalated 7x24.
optional: define dependencies on the triggers to that you only get one event (host dead) if both ping & snmp/Zabbix fail
put all this into one template and assign it the respective hosts
Now you can rely on the "host dead" trigger (it's always available no matter if you do ping & snmp/zabbix/jmx/whatever) - which is much more relevant than the default "interface works" status field from the host object.
I have a client software program used to launch alarms through a central server. At first it stored configuration data in registry entries, now in a configuration XML file. This configuration information consists of Alarm number, alarm group, hotkey combinations, and such.
This client connects to a server using a TCP socket, which it uses to communicate this configuration to the server. In the next generation of this program, I'm considering moving all configuration information to the server, which stores all of its information in a SQL database.
I envision using some form of web interface to communicate with the server and setup the clients, rather than the current method, which is to either configure the client software on the machine through a control panel, or on install to ether push out an xml file, or pass command line parameters to the MSI. I'm thinking now the only information I would want to specify on install would be the path to the server. Each workstation would be identified by computer name, and configured through the server.
Are there any problems or potential drawbacks of this approach? The main goal is to centralize configuration and make it easier to make changes later, because our software is usually managed by one or two people at most.
Other than allowing for the client to function offline (if such a possibility makes sense for your application), there doesn't appear to be any drawback of moving the configuration to a centralized location. Indeed even with a centralized location, a feature can be added in the client to cache the last known configuration, for use when the client is offline).
In case you implement a [centralized] database design, I suggest to consider storing configuration parameters in an Entity-Attribute-Value (EAV) structure as this schema is particularly well suited for parameters. In particular it allows easy addition and removal of particular parameters and also the handling parameters as a list (paving the way for a list-oriented display as well in the UI, and therefore no changes needed in the UI either when new types of parameters are introduced).
Another reason why configuartion parameter collections and EAV schemas work well together is that even with very many users and configuration points, the configuration data remains small enough that is doesn't suffer some of the limitations of EAV with "big" tables.
Only thing that comes to mind is security of the information. In either case you probably have that issue though. Probably be easier to interface with though with a database as everything would be in one spot.
I can think of a few hacks using ping, the box name, and the HA shared name but I think that they are leading to data leakage.
Should a box even know its part of an HA cluster or what that cluster name is? Is this more a function of DNS? Is there some API exposed for boxes to join an HA cluster and request the id of the currently active node?
I want to differentiate between the inactive node and active node in alerting mechanisms for a running program. If the active node is alerting I want to hit a pager and on the inactive node I want to send an email. Pushing the determination into the alerting layer moves the same problem elsewhere.
EASY SOLUTION: Polling the server from an external agent that connects through the network makes any shell game of who is the active node a moot point. To clarify this the only thing that will page is the remote agent monitoring the real. Each box can send emails all day long for all I care.
It really depends on the HA system you're using.
For example, if your system uses a shared IP and the traffic is managed by some hardware box, then it can be hard to determine if a certain box is a master or slave. That will depend on a specific solution really... As long as you can add a custom script to the supervisor, you should be ok - for example the controller can ping a daemon on the master server every second. In the alerting script, simply check if the time of the last ping < 2 sec...
If your system doesn't have a supervisor / controller node, but each node tries to determine the state itself, you can have more problems. If a split brain occurs, you can end up with both slaves or both masters, so your alerting software will be wrong in both cases. Gadgets that can ensure only one live node (STONITH and others) could help.
On the other hand, in the second scenario, if the HA software works on both hosts properly, you should be able to obtain the master/slave information straight from it. It has to know its own state at any time, because it's one of its main functions. In most HA solutions you should be able to either get the current state, or add some code to run when the state changes. Heartbeat offers both.
I wouldn't worry about the edge cases like a split brain though. Almost any situation when you lose connection between the clustered nodes will be more important than the stuff that happens on the separate nodes :)
If the thing you care about is really logging / alerting only, then ideally you could have a separate logger box which gets all the information about the current network / cluster status. External box will probably have better idea how to deal with the situation. If your cluster gets dos'ed / disconnected from the network / loses power, you won't get any alert. A redundant pair of independent monitors can save you from that.
I'm not sure why you mentioned DNS - due to its refresh time it shouldn't be a source of any "real-time" cluster information.
One way is to get the box to export it's idea of whether it is active into your monitoring. From there you can predicate paging/emailing on this status (with a race condition around failover), and alert on none/too many systems believing they are active.
Another option is to monitor the active system via a DNS alias (or some other method to address the active system) and page on that. Then also monitor all the systems, both active and inactive, and email on that. This will cause duplicate alerts for the active system, but that's probably okay.
It's hard to be more specific without knowing more about your setup.
As a rule, the machines in a HA cluster shouldn't really know which one is active. There's one exception, mind, and that's with cronjobs. At work, we have a HA cluster on top of which some rather important services run. Some of those use services have cronjobs, and we only want them running on the active box. To do that, we use this shell script:
#!/bin/sh
HA_CLUSTER_IP=0.0.0.0
if ip addr | grep $HA_CLUSTER_IP >/dev/null; then
eval "$#"
fi
(Note that this is running on Debian.) What this does is check to see if the current box is the active one within the cluster (replace 0.0.0.0 with the external IP of your HA cluster), and if so, executes the command passed in as arguments to the script. This ensures that one and only one box is ever actually executing the cronjobs.
Other than that, there's really no reasons I can think of why you'd need to know which box is the active one.
UPDATE: Our HA cluster uses Heartbeat to assign the cluster's external IP address as a secondary address to the active machine in the cluster. Programmatically, you can check to see if your machine is the current active box by calling gethostbyname(), and iterating over the data returned until you either get to the end or you find the cluster's IP in the list.
Without hard-coding.... ? I assume you mean some native heartbeat query, not sure. However, you could use ifconfig, HA creates a virtual interface on whatever interface it is configured to run on. For instance if HA was configured on eth0 then it would create a virtual interface of eth0:0, but only on the active node.
Therefore you could do a simple query of the ifconfig output to determine if the server twas the active node or not, for example if eth0 was the configured interface:
ACTIVE_NODE=`ifconfig | grep -c 'eth0:0'`
That will set the $ACTIVE_NODE variable to 1 (for active) and 0 (if standby). Hope that may help.
http://www.of-networks.co.uk