File/Data transfer between two arbitrary sources - language-agnostic

I'm looking for a simple way to implement this scenario:
Say I have two machines I'd like to share data between. The location/addresses of these machines can change at any time. I'd like both machines to check in to a central server to announce their availability. One of the two systems wants to pull a file from the other. I know that I can have the sink system make a request to the server, who then requests the file from the source, pulls it, then feeds it to the requester. However, this seems inefficient from a bandwidth perspective. The file will be transfered twice. Is there a system in place where the source can broadcast it directly to the sink?
Without being able to guarantee things like port forwarding if a system is behind a firewall etc. I don't know of a way.
Thanks.

When machine A wants to send data to machine B, A sends a request to the central server C. C asks B for permission. If accepted, C gives B's IP and port to A. A attempts to connect to B directly. If unsuccessful (ie, if B is behind a router/firewall), then A notifies C of the failure. C then gives A's IP and port to B. B attempts to connect directly to A (which should be able to pass through B's firewall/router). If either connection is successful, then A has a direct connection to send data to B. If both connections are unsuccessful (ie, if A is also behind a firewall/router), then C has to act as a proxy for all transfers between A and B.

Related

Constants over multiple servers

The question is, how to update a constant? This sounds like a stupid question, but let's look at the background of my issue:
Background
I manage a network of servers, which includes a MySQL server, multiple HTTP servers, and a Minecraft server (a self-hosted server that gamers who have installed Minecraft can connect to and play together). All of the user-end services (HTTP servers, Minecraft server, user apps) are directly or indirectly related to the MySQL server. The MySQL database stores different data for each player account, for example, the online/offline status of players, etc.
In programming, constants are used to create a general reference to a value that will not change across a runtime. Especially, for software-internal identifiers, such as data flags, bitmasks, etc. In my case, I also use constants to store specific data, such as the MySQL server's address and other credentials. So when I want to change the server address, I only need to modify them from one point, for example, an internal constants.php of the server.
Problem
When I migrate my MySQL database to another host or change password, I have to update the details on every server. It is not possible to create a centralized data provider that provides the server address, because the MySQL server itself is the centralized data provider. That means, every time I change the value, I must update all servers. I must also maintain a very private and local list (probably has to be written down on a memo and stick it on my computer!) of all these places, because it is really hard to locate all these references. So, my question is, is there a better way of management that allows me to change the values from one place? Note that the servers are on different hosts, so it is not possible to put it in a local file, and it doesn't sound reasonable to create a centralized data provider (call it password provider) to provide access of the real centralized data provider (MySQL) either, since if I have the need to change the MySQL database details, I have the same need to change the password provider details as well.
This is less a concern. but since it is a similar question, I am putting it down here too. I use integer bitmasks to store player ranks. For example, if the player is a VIP, he has a 0x01 flag, and if the player is a moderator, he has a 0x10 flag, and 0x11 if both VIP and moderator, vice versa. I want to refactor the bitmask values as well, but it would be great trouble, because I need to shut down all servers and update the MySQL values, update constants on every server, then restart all servers, to avoid potential security vulnerability in the period of updating. Is there a more convenient way to do that?
This is a network management question too, but I consider it more programming-related.
We are talking about deployment system. For example we can use
capistrano: https://github.com/capistrano/capistrano. We need to
save constants.php to git and create task in capistrano for deploy
this file to each server. I use this tool for deploying of projects which are one of the 50 busiest sites of the Russian segment of the Internet:)
We are talking about data migration. So there are several ways. Some of them with downtimes and some not (sometimes it depends on the situation).
Data migration without downtime:
modify your app so it will understand old variant of players bitmask
and new one
deploy modified app
update bitmask into your databases
modify your app so it will understand only new variant of bitmasks
deploy modified app

Scaling websocket node server

I know this question has been asked partially before (How to Scale Node.js WebSocket Redis Server?) but I am wondering if there is any alternatives to redis for rapidly sharing websocket objects between node instances, specifically ws type sockets (https://github.com/einaros/ws). I've tried redis and ran into issues with the fact that the web socket objects are cyclic and difficult to serialise. I then used Crockford's cycle.js (https://github.com/douglascrockford/JSON-js/blob/master/cycle.js), however it seems to strip out the websocket objects methods, as I get an error from node saying "Object object has no method send" after I have read the socket back from redis and retrocycled it. Any help would be much appreciated.
Thanks in advance, James.
IMO you should use messaging queue for that.. e.g (RabbitMQ)
Application starts on Node A and Node B and connects to RabbitMQ
Client A connects to Node A and subscribe to Queue named XXX Client
Client B connects to Node B and subscribe to Queue named XXX
Client A sendsmessage to websocket server Websocket Server sends message to Node A
Node A publishes messages to RabbitMQ queue XXX
Node B receives the message from RabbitMQ as it is subscribed to queue XXX
Node B sends message to Client B or publishes the messages to all connected clients on node B
So, all you need is to put Messaging queue in your architecture (RabbitMQ, ZeroMQ) etc
There is a library which allows easily scale WebSocket across node.js processes and machines, you can check out it:
https://github.com/ClusterWS/ClusterWS
When we speak of scalability we expect or want to hear the words linear performance gains. To be honest though this is not the case most setups as their reliance on another server/service is too great and thus bottle-necks form up within the network you're trying to host for users.
As we explore options we hear things like Databases, Message Queues, and Brokers; These are fine to use but as mentioned above if reliance on any of them is far too great you will destroy your setup in sure time.
Design the WSS Server to act solo (unless requirements are exceeded). You determine and set limits and let API server know this. So if I have 10 chat-rooms and they hold maximum 100 users and benching my WSS server proved I could hold 400-500 of them. With that information I'd set 4-5 rooms per server. So if two people enter room#1 they are on WSS server#1; If all 10 chat-rooms are full then WSS server #2 is now full and 11th room will need a WSS Server#3 up to 15th room.
The slowest part of the network would now just be your API server handling requests but this may include database as well.
If your requirements are for more users than the example, you can increase core power first or add a second server with help of an MQ or Redis Pub/Sub type setup.
Unfortunately there's no way to properly sort users, so if 3 rooms had 20 users and all were sitting on WSS server#1 that'd still leave a room left with hundreds of user slots available but is this really a problem?
It's possible this room could fill right up so leave them the spot, but still could be days till they max so programming something spicy for your needs will improve how cost effective you make it.

Configuring BGP Neighbors In Quagga

I am trying to run Quagga on a couple of connected VMs and am confused about how to write the neighbor command in the bgpd.conf configuration file. All my queries are about the following specific statement of neighbor specification:
neighbor peer remote-as asn
What should I provide for the 'peer' IP value?
Say I am configuring a VM A which is many hops away from a neighbor B (lets assume same AS number). So when I add neighbor B in the bgpd.conf configuration file, which particular interface IP of B should be added as the peer IP in the configuration file.
I am seeing that for some interface IPs the connection is not getting established and for some it is. So I want to know theoretically which of the interface IPs should be specified.
I did a lot of Google study but no ones clear about this.
Please help.
Are you able to ping those ip addresses/interfaces where you are unable to establish the bgp session?
If not then you can't establish.

Bi-directional communication with 1 socket - how to deal with collisions?

I have one app. that consists of "Manager" and "Worker". Currently, the worker always initiates the connection, says something to the manager, and the manager will send the response.
Since there is a LOT of communication between the Manager and the Worker, I'm considering to have a socket open between the two and do the communication. I'm also hoping to initiate the interaction from both sides - enabling the manager to say something to the worker whenever it wants.
However, I'm a little confused as to how to deal with "collisions". Say, the manager decides to say something to the worker, and at the same time the worker decides to say something to the manager. What will happen? How should such situation be handled?
P.S. I plan to use Netty for the actual implementation.
"I'm also hoping to initiate the interaction from both sides - enabling the manager to say something to the worker whenever it wants."
Simple answer. Don't.
Learn from existing protocols: Have a client and a server. Things will work out nicely. Worker can be the server and the Manager can be a client. Manager can make numerous requests. Worker responds to the requests as they arrive.
Peer-to-peer can be complex with no real value for complexity.
I'd go for a persistent bi-directional channel between server and client.
If all you'll have is one server and one client, then there's no collision issue... If the server accepts a connection, it knows it's the client and vice versa. Both can read and write on the same socket.
Now, if you have multiple clients and your server needs to send a request specifically to client X, then you need handshaking!
When a client boots, it connects to the server. Once this connection is established, the client identifies itself as being client X (the handshake message). The server now knows it has a socket open to client X and every time it needs to send a message to client X, it reuses that socket.
Lucky you, I've just written a tutorial (sample project included) on this precise problem. Using Netty! :)
Here's the link: http://bruno.linker45.eu/2010/07/15/handshaking-tutorial-with-netty/
Notice that in this solution, the server does not attempt to connect to the client. It's always the client who connects to the server.
If you were thinking about opening a socket every time you wanted to send a message, you should reconsider persistent connections as they avoid the overhead of connection establishment, consequently speeding up the data transfer rate N-fold.
I think you need to read up on sockets....
You don't really get these kinds of problems....Other than how to responsively handle both receiving and sending, generally this is done through threading your communications... depending on the app you can take a number of approaches to this.
The correct link to the Handshake/Netty tutorial mentioned in brunodecarvalho's response is http://bruno.factor45.org/blag/2010/07/15/handshaking-tutorial-with-netty/
I would add this as a comment to his question but I don't have the minimum required reputation to do so.
If you feel like reinventing the wheel and don't want to use middleware...
Design your protocol so that the other peer's answers to your requests are always easily distinguishable from requests from the other peer. Then, choose your network I/O strategy carefully. Whatever code is responsible for reading from the socket must first determine if the incoming data is a response to data that was sent out, or if it's a new request from the peer (looking at the data's header, and whether you've issued a request recently). Also, you need to maintain proper queueing so that when you send responses to the peer's requests it is properly separated from new requests you issue.

Can a webserver determine if its the active node of an HA failover system without hard coding anything on the server itself?

I can think of a few hacks using ping, the box name, and the HA shared name but I think that they are leading to data leakage.
Should a box even know its part of an HA cluster or what that cluster name is? Is this more a function of DNS? Is there some API exposed for boxes to join an HA cluster and request the id of the currently active node?
I want to differentiate between the inactive node and active node in alerting mechanisms for a running program. If the active node is alerting I want to hit a pager and on the inactive node I want to send an email. Pushing the determination into the alerting layer moves the same problem elsewhere.
EASY SOLUTION: Polling the server from an external agent that connects through the network makes any shell game of who is the active node a moot point. To clarify this the only thing that will page is the remote agent monitoring the real. Each box can send emails all day long for all I care.
It really depends on the HA system you're using.
For example, if your system uses a shared IP and the traffic is managed by some hardware box, then it can be hard to determine if a certain box is a master or slave. That will depend on a specific solution really... As long as you can add a custom script to the supervisor, you should be ok - for example the controller can ping a daemon on the master server every second. In the alerting script, simply check if the time of the last ping < 2 sec...
If your system doesn't have a supervisor / controller node, but each node tries to determine the state itself, you can have more problems. If a split brain occurs, you can end up with both slaves or both masters, so your alerting software will be wrong in both cases. Gadgets that can ensure only one live node (STONITH and others) could help.
On the other hand, in the second scenario, if the HA software works on both hosts properly, you should be able to obtain the master/slave information straight from it. It has to know its own state at any time, because it's one of its main functions. In most HA solutions you should be able to either get the current state, or add some code to run when the state changes. Heartbeat offers both.
I wouldn't worry about the edge cases like a split brain though. Almost any situation when you lose connection between the clustered nodes will be more important than the stuff that happens on the separate nodes :)
If the thing you care about is really logging / alerting only, then ideally you could have a separate logger box which gets all the information about the current network / cluster status. External box will probably have better idea how to deal with the situation. If your cluster gets dos'ed / disconnected from the network / loses power, you won't get any alert. A redundant pair of independent monitors can save you from that.
I'm not sure why you mentioned DNS - due to its refresh time it shouldn't be a source of any "real-time" cluster information.
One way is to get the box to export it's idea of whether it is active into your monitoring. From there you can predicate paging/emailing on this status (with a race condition around failover), and alert on none/too many systems believing they are active.
Another option is to monitor the active system via a DNS alias (or some other method to address the active system) and page on that. Then also monitor all the systems, both active and inactive, and email on that. This will cause duplicate alerts for the active system, but that's probably okay.
It's hard to be more specific without knowing more about your setup.
As a rule, the machines in a HA cluster shouldn't really know which one is active. There's one exception, mind, and that's with cronjobs. At work, we have a HA cluster on top of which some rather important services run. Some of those use services have cronjobs, and we only want them running on the active box. To do that, we use this shell script:
#!/bin/sh
HA_CLUSTER_IP=0.0.0.0
if ip addr | grep $HA_CLUSTER_IP >/dev/null; then
eval "$#"
fi
(Note that this is running on Debian.) What this does is check to see if the current box is the active one within the cluster (replace 0.0.0.0 with the external IP of your HA cluster), and if so, executes the command passed in as arguments to the script. This ensures that one and only one box is ever actually executing the cronjobs.
Other than that, there's really no reasons I can think of why you'd need to know which box is the active one.
UPDATE: Our HA cluster uses Heartbeat to assign the cluster's external IP address as a secondary address to the active machine in the cluster. Programmatically, you can check to see if your machine is the current active box by calling gethostbyname(), and iterating over the data returned until you either get to the end or you find the cluster's IP in the list.
Without hard-coding.... ? I assume you mean some native heartbeat query, not sure. However, you could use ifconfig, HA creates a virtual interface on whatever interface it is configured to run on. For instance if HA was configured on eth0 then it would create a virtual interface of eth0:0, but only on the active node.
Therefore you could do a simple query of the ifconfig output to determine if the server twas the active node or not, for example if eth0 was the configured interface:
ACTIVE_NODE=`ifconfig | grep -c 'eth0:0'`
That will set the $ACTIVE_NODE variable to 1 (for active) and 0 (if standby). Hope that may help.
http://www.of-networks.co.uk