Scaling websocket node server - json

I know this question has been asked partially before (How to Scale Node.js WebSocket Redis Server?) but I am wondering if there is any alternatives to redis for rapidly sharing websocket objects between node instances, specifically ws type sockets (https://github.com/einaros/ws). I've tried redis and ran into issues with the fact that the web socket objects are cyclic and difficult to serialise. I then used Crockford's cycle.js (https://github.com/douglascrockford/JSON-js/blob/master/cycle.js), however it seems to strip out the websocket objects methods, as I get an error from node saying "Object object has no method send" after I have read the socket back from redis and retrocycled it. Any help would be much appreciated.
Thanks in advance, James.

IMO you should use messaging queue for that.. e.g (RabbitMQ)
Application starts on Node A and Node B and connects to RabbitMQ
Client A connects to Node A and subscribe to Queue named XXX Client
Client B connects to Node B and subscribe to Queue named XXX
Client A sendsmessage to websocket server Websocket Server sends message to Node A
Node A publishes messages to RabbitMQ queue XXX
Node B receives the message from RabbitMQ as it is subscribed to queue XXX
Node B sends message to Client B or publishes the messages to all connected clients on node B
So, all you need is to put Messaging queue in your architecture (RabbitMQ, ZeroMQ) etc

There is a library which allows easily scale WebSocket across node.js processes and machines, you can check out it:
https://github.com/ClusterWS/ClusterWS

When we speak of scalability we expect or want to hear the words linear performance gains. To be honest though this is not the case most setups as their reliance on another server/service is too great and thus bottle-necks form up within the network you're trying to host for users.
As we explore options we hear things like Databases, Message Queues, and Brokers; These are fine to use but as mentioned above if reliance on any of them is far too great you will destroy your setup in sure time.
Design the WSS Server to act solo (unless requirements are exceeded). You determine and set limits and let API server know this. So if I have 10 chat-rooms and they hold maximum 100 users and benching my WSS server proved I could hold 400-500 of them. With that information I'd set 4-5 rooms per server. So if two people enter room#1 they are on WSS server#1; If all 10 chat-rooms are full then WSS server #2 is now full and 11th room will need a WSS Server#3 up to 15th room.
The slowest part of the network would now just be your API server handling requests but this may include database as well.
If your requirements are for more users than the example, you can increase core power first or add a second server with help of an MQ or Redis Pub/Sub type setup.
Unfortunately there's no way to properly sort users, so if 3 rooms had 20 users and all were sitting on WSS server#1 that'd still leave a room left with hundreds of user slots available but is this really a problem?
It's possible this room could fill right up so leave them the spot, but still could be days till they max so programming something spicy for your needs will improve how cost effective you make it.

Related

HTML5 Websocket Live-Application Limitations

I'm developing a HTML5 Websocket-Based application which should notify the users in real-time about different events. The client connect to the server, send a handshake with some securitytoken, the server check if the securitytoken is valid and add the client to the list of active clients. Now he get notifications on special events.
Because there are different notifications from multiplice applications, there is a notification-core where handle the basics of the connection and also the authentification because this is always the same. The core can be accessed from applications, with them they can communicate to the server.
Does it make sense or is it necessary to insert some limitations in the core? For example tracking the user-ip and refuse the connection if the user has more than lets say 3 connections to the server in the last 10 seconds to prevent flood-attacks.
In my oppinion I think it can reduce serverload if someone try to crash my service by holding the F5 key or using some botnet as long as he isn't sending so much traffic to my server that my connection can't handle that much.
I'm using socket.io if this is important.
If you're trying to protect your application from malicious attacks, there are many, many things you would need to consider and it is important to prioritize those things and spend your development time on the things that could most impact your service. I would think that creating multiple webSocket connections would be very low on the priority list way behind operations in your service that actually change state such as cause writes to a database, etc... Modern servers can easily hold tens of thousands of sockets and it costs little server load to just be sending the same notification to lots of sockets.
In addition, using the IP address as something to limit by can cause problems because larger organizations may use NAT to share a single IP address among many users for outbound connections. If you are going to limit by user, it's much better to limit by a userID (something each user uniquely logs in with).

Implementing dynamically updating upvote/downvote

How to implement dynamically updating vote count similar to quora:- Whenever a user upvotes an answer its reflected automatically for every one who is viewing that page.
I am looking for an answer that address following:
Do we have to keep polling for upvote counts for every answer, If yes
then how to manage the server load arising because of so many users
polling for upvotes.
Or to use websockits/push notifications, how scalable are these?
How to store the upvote/downvote count in databases/inmemory to support this. How do they control the number of read/writes. My backend database is mysql
The answer I am looking for may not be exactly how quora is doing it, but may be how this can be done using available opensource technologies.
It's not the back-end system details that you need to worry about but the front end. Having connection being open all the time is impractical at any real scale. Instead you want the opposite - to be able to serve and close connection from back-end as fast as you can.
Websockets is a sexy technology, but again, in real world there are issues with proxies, if you are developing something that should work on a variety of screens (desktop, tablet, mobile) it might became a concern to you. Even good-old long polls might not work through firewalls and proxies.
Here is a good news: I think
"keep polling for upvote counts for every answer"
is a totally good solution in this case. Consider the following:
your use-case does not need any real real-time updates. There is little harm to see the counter updated a bit later
for very popular topics you would like to squash multiple up-votes/down-votes into one anyway
most of the topics will see no up-vote/down-vote traffic at all for days/weeks, so keeping a connection open, waiting for an event that never comes is a waste
most of the user will never up-vote/down-vote that just came to read a topic, so your read/write ration of topics stats will be greatly skewed toward reads
network latencies varies hugely across clients, you will see horrible transfer rates for a 100B http responses, while this sluggish client is fetching his response byte-by-byte your precious server connection and what is more importantly - thread on a back end server is busy
Here is what I'd start with:
have browsers periodically poll for a new topic stat, after the main page loads
keep your MySQL, keep counters there. Every time there is an up/down vote update the DB
put Memcached in front of the DB as a write-through cache i.e. every time there is an up/down vote update cache, then update DB. Set explicit expire time for a counter there to be 10-15 minutes . Every time counter is updated expire time is prolongated automatically.
design these polling http calls to be cacheable by http proxies, set expire and ttl http headers to be 60 sec
put a reverse proxy(Varnish, nginx) in front of your front end servers, have this proxy do the caching of the said polling calls. These takes care of the second level cache and help free up backend servers threads quicker, see network latencies concern above
set-up your reverse proxy component to talk to memcached servers directly without making a call to the backend server, yes if your can do it with both Varnish and nginx.
there is no fancy schema for storing such data, it's a simple inc()/dec() operation in memcached, note that it's safe from the race condition point of view. It's also a safe atomic operation in MySQL UPDATE table SET field = field + 1 WHERE [...]
Aggressive multi level caching covers your read path: in Memcached and in all http caches along the way, note that these http poll requests will be cached on the edges as well.
To take care of the long tail of unpopular topic - make http ttl for such responses reverse proportional to popularity.
A read request will only infrequently gets to the front end server, when http cache expired and memcached does not have it either. If that is still a problem, add memecached servers and increase expire time in memcached across the board.
After you done with that you have all the reads taken care of. The only problem you might still have, depending on the scale, is high rate of writes i.e. flow of up/down votes. This is where your single MySQL instance might start showing some lags. Fear not - proceed along the old beaten path of sharding your instances, or adding a NoSQL storage just for counters.
Do not use any messaging system unless absolutely necessary or you want an excuse to play with it.
Websockets, Server Sent Events (I think that's what you meant by 'push notifications') and AJAX long polling have the same drawback - they keep underlying TCP connection open for a long time.
So the question is how many open TCP connections can a server handle.
Basically, it depends on its OS, number of file descriptors (a config parameter) and available memory (each open connection reserves a read/write buffers).
Here's more on that.
We once tested a possibility to keep 1 million websocket connections open on a single server (Windows 7 x64 with 16Gb of RAM, JVM 1.7 with 8Gb of heap, using Undertow beta to serve Web requests).
Surprisingly, the hardest part was to generate the load on the server )
It managed to hold 1M. But again the server didn't do something useful, just received requests, went through protocol upgrade and kept those connections open.
There was also some number of lost connections, for whatever reason. We didn't investigate. But in production you would also have to ping the server and handle reconnection.
Apart from that, Websockets seem like an overkill here, SSE still aren't widely adopted.
So I would go with good old AJAX polling, but optimize it as much as possible.
Works everywhere, simple to implement and tweak, no reliance on an external system (I had bad experience with that several times), possibilities for optimization.
For instance, you could group updates for all open articles in a single browser, or adjust update interval according to how popular the article is.
After all it doesn't seem like you need real-time notifications here.
sounds like you might be able to use a messaging system like Kafka, or RabbitMQ, or ActiveMQ. Your front end would sent votes to a message channel and receive them with a listener, and you could have a server side piece persist the votes to the db periodically.
You could also accomplish your task by polling your database, and by incre/decre menting a number related to a post via a stored proc... there are a bunch of options here and it depends on how much concurrency you may be facing.

Transfer of Websocket communication from server to client in Streaming

I'm new websocket streaming application.
I'm trying to evaluate kaazing and solace streaming vendor products.
I'm trying to put a layer or interface application infornt of kaazing publisher before it creates the socket connection.
The client would make request to the interface for the socket connection , the interface plays the role to authenticate and authorize and do some business changes before the creation of socket.
The interface establish the secured socket connection with kazzing or streaming application and transfer the connection object to the client in response.
The client use the established connection from the reponse and retains the connection for streaming the data to client from the streaming server.
The objective is to hide the topic info and connection established from the client side for secured process.
So the infterace creates the secured connection and transfers the connection to authorized client which continue the streaming from the streaming server until the session expires.
Please let me designing such an application is possible.
The client receives data but it doesn't establish connection of its own, its created from server side and transfered to client side after all necessary validation.
Guide me to proceed further with the design and I'm inneed of expert's valuable suggestion.
Thanks in advance and appriciated for directing me to the right path in designing the streaming appliation.
My aim is not to introduce a new layer between the gateway and client. Altimate aim is to customize the gateway for my product, For example the user(client) connects to gateway and tries to access the topic say stock(eligible to subscribe) then he will be able to get the data streaming. If he needs share he would register is some place and after approval only he will be eligible to view, So authorization would play a role and maintaince a session and loads the customized data and allows to stream.
Typically am trying to have a dashboard of data streaming. So only authorized streaming is allowed for user. Also he will be able to see all the topic name. whether its possible to use proxy name , example 1, or 2 might be the value he would use from client side, when it reaches the gateway it verify the authorization and replace the value with real topic name and establish the data.
Please let me know whether I've my question clear. Your valuable suggestions and guidance will be more helpful to continue with my research.
Thanks
Krish
in case you wonder how the world has changed over the last five year to address this kind of need, now you could give a try to streamdata.io, a proxy which uses SSE protocol (unidirectional) and is available in minutes. It would perfectly fit what's needed in the example given, because Im not sure WebSocket is the ideal answer here. Tho it should work, it's basically like buying a harvester to go get your kids at school.
As a disclaimer, I need to say that I work for Streamdata, and we see this kind of thing every day.
WebSocket is great if you need bidirectional streams of data, like in chats or games. Most of the time, when the clients mainly expect updated data from the server, instead of polling AF or deploying WebSocket, SSE is the best alternative as it's easier to deploy (http, etc.). The rest of the settings (for security) can be done through the proxy interface.
Boom: it's available before you even realize it, it's secure, easy to maintain and more efficient than before. You have more time available to answer questions from the community over here; your team is happy; your boss loves you more than before and might even give you a raise. Life can be so simple sometimes!

Get client to act as server with websocket?

I am basically writing an almost purely clientside application (there is a webserver which can be used to store some persistent data, but its easier to forget about it), but as part of this I was looking to add some functionality akin to hosting a game.
The scenario would be 1 person would host the game via their browser (open a TCP socket awaiting connections), then X other people would connect to that server and join. The server would be in charge of receiving and sending data between clients.
So in this scenario is it possible to host a websocket server within a webpage?
I was looking at trying to do something peer to peer style, but I don't think it is currently supported, but its not a major problem as its only going to be for sending small amounts of text and some update messages between clients.
The WebSocket browser API is client only (for the foreseeable future).
In some sense, WebRTC us peer-to-peer, but even if the WebRTC API adds the ability to send arbitrary data, you still need a STUN/TURN server to establish the initial connection.

Bi-directional communication with 1 socket - how to deal with collisions?

I have one app. that consists of "Manager" and "Worker". Currently, the worker always initiates the connection, says something to the manager, and the manager will send the response.
Since there is a LOT of communication between the Manager and the Worker, I'm considering to have a socket open between the two and do the communication. I'm also hoping to initiate the interaction from both sides - enabling the manager to say something to the worker whenever it wants.
However, I'm a little confused as to how to deal with "collisions". Say, the manager decides to say something to the worker, and at the same time the worker decides to say something to the manager. What will happen? How should such situation be handled?
P.S. I plan to use Netty for the actual implementation.
"I'm also hoping to initiate the interaction from both sides - enabling the manager to say something to the worker whenever it wants."
Simple answer. Don't.
Learn from existing protocols: Have a client and a server. Things will work out nicely. Worker can be the server and the Manager can be a client. Manager can make numerous requests. Worker responds to the requests as they arrive.
Peer-to-peer can be complex with no real value for complexity.
I'd go for a persistent bi-directional channel between server and client.
If all you'll have is one server and one client, then there's no collision issue... If the server accepts a connection, it knows it's the client and vice versa. Both can read and write on the same socket.
Now, if you have multiple clients and your server needs to send a request specifically to client X, then you need handshaking!
When a client boots, it connects to the server. Once this connection is established, the client identifies itself as being client X (the handshake message). The server now knows it has a socket open to client X and every time it needs to send a message to client X, it reuses that socket.
Lucky you, I've just written a tutorial (sample project included) on this precise problem. Using Netty! :)
Here's the link: http://bruno.linker45.eu/2010/07/15/handshaking-tutorial-with-netty/
Notice that in this solution, the server does not attempt to connect to the client. It's always the client who connects to the server.
If you were thinking about opening a socket every time you wanted to send a message, you should reconsider persistent connections as they avoid the overhead of connection establishment, consequently speeding up the data transfer rate N-fold.
I think you need to read up on sockets....
You don't really get these kinds of problems....Other than how to responsively handle both receiving and sending, generally this is done through threading your communications... depending on the app you can take a number of approaches to this.
The correct link to the Handshake/Netty tutorial mentioned in brunodecarvalho's response is http://bruno.factor45.org/blag/2010/07/15/handshaking-tutorial-with-netty/
I would add this as a comment to his question but I don't have the minimum required reputation to do so.
If you feel like reinventing the wheel and don't want to use middleware...
Design your protocol so that the other peer's answers to your requests are always easily distinguishable from requests from the other peer. Then, choose your network I/O strategy carefully. Whatever code is responsible for reading from the socket must first determine if the incoming data is a response to data that was sent out, or if it's a new request from the peer (looking at the data's header, and whether you've issued a request recently). Also, you need to maintain proper queueing so that when you send responses to the peer's requests it is properly separated from new requests you issue.