How does dropbox' response queue work, exactly? - message-queue

I am reading this writing: https://medium.com/#narengowda/system-design-dropbox-or-google-drive-8fd5da0ce55b. In the Synchronization Service part, it writes that:
The Response Queues that correspond to individual subscribed clients are responsible for delivering the update messages to each client. Since a message will be deleted from the queue once received by a client, we need to create separate Response Queues for each client to be able to share an update message which should be sent to multiple subscribed clients.
The context is that we need a response queue to send the file updates from one client to other clients. I am confused by this statement. If Dropbox has 100 million clients, we need to create 100 million queues, based on the statement. It is unimaginable to me. For example, a Kafka cluster can support up to 5K topics (https://stackoverflow.com/questions/32950503/can-i-have-100s-of-thousands-of-topics-in-a-kafka-cluster#:~:text=The%20rule%20of%20thumb%20is,5K%20topics%20should%20be%20fine.). We need 20K Kafka clusters in this case. Which queuing system can do 100 million "topics"?

Not sure but I expect such notification to clients via web-sockets only.
Additionally as this medium blog states that if client is not online then messages might have to be persisted in DB. After that when client comes online they can request for all updates after certain timestamp after which web sockets can be setup to facilitate future communication.
Happy to know your thoughts on this.
P.S : Most of dropbox system design blogs/vlogs have just copied from each other without going into low level detail.

Related

how to update retrieved data in the front-end when data is updated in the database?

I'm building a mobile app where a pull-down gesture on the UI initiates an update of existing data/posts (also retrieves new posts if there are any, but that's not the point here). The server is stateless meaning there is no sessions.
If the posts have been updated in the database, how do I let the front-end know which posts need to be updated? Only way I could think of is to send a list of ids of all retrieved posts to the server, and have it check if any of the posts have been modified since the time fetched.
This however seems quiet inefficient as the users might have stacked up hundreds of posts in some extreme cases, and it's most likely that only few or none of the posts need to be updated. Issuing hundreds of db requests could be a huge overhead.
There are at least 2 ways of doing this
Long Polling
client requests server for new information.
the server keeps the connection open until there is some new data to send.
once the server gets data, it sends this to the client and connection is closed.
the client then sends a new request for information.
This is a continuous process.
WebSockets
Create a websocket connection, keep it open.
The server pushes any updates as and when they come.
Problems with both situations
May take a significant amount of time to have production ready implementations.
Both them will require the server to be aware of any change in the database. This can be tricky as well

Volume or frequency limitations of SQL Server Database Mail

I've created a nightly sync between two database applications for a small construction company and setup simple notifications using database mail to let a few people know if the load was successful or not. Now that they see this notification is working I've been asked to provide status updates to their clients as employees make changes to the work order throughout the day.
I've done some research and understand DB Mail is not designed for this type of feature but I'm thinking the frequency will be small enough to not be a problem. I'm estimating 50-200 emails per day.
I couldn't find anything on the actual limitations of DB Mail and wondering if anyone has tried something similar in the past or if I could be pushed in the right direction to send these emails using best practice.
If we're talking hundreds here you can definitely go ahead. Take a peak at the Database Mail MSDN page. The current design (i.e. anything post-SQL2000) was specifically designed for large, high-performance enterprise implementations. Built on top of Service Broker (SQL Server's message queuing bus) it offers both asynchronous processing and scalability with process isolation, clustering, and failover. One caveat is increased transaction log pressure as messages, unlike in some other implementations, are ACID-protected by SQL Server which in turn gives you full recoverability of the queues in case of failure.
If you're wondering what Service Broker can handle before migrating to a dedicated solution, there's a great MySpace case study. The most interesting fragment:
We didn’t want to start down the road of using Service Broker unless
we could demonstrate that it could handle the levels of messages that
we needed to support our millions of users across 440 database
servers,” says Stelzmuller. “When we went to the lab we brought our
own workloads to ensure the quality of the testing. We needed to see
if Service Broker could handle loads of 4,000 messages per second. Our
testing found it could handle more than 18,000 messages a second. We
were delighted that we could build our solution using Service Broker,
rather than creating a custom solution on our own.

Scaling websocket node server

I know this question has been asked partially before (How to Scale Node.js WebSocket Redis Server?) but I am wondering if there is any alternatives to redis for rapidly sharing websocket objects between node instances, specifically ws type sockets (https://github.com/einaros/ws). I've tried redis and ran into issues with the fact that the web socket objects are cyclic and difficult to serialise. I then used Crockford's cycle.js (https://github.com/douglascrockford/JSON-js/blob/master/cycle.js), however it seems to strip out the websocket objects methods, as I get an error from node saying "Object object has no method send" after I have read the socket back from redis and retrocycled it. Any help would be much appreciated.
Thanks in advance, James.
IMO you should use messaging queue for that.. e.g (RabbitMQ)
Application starts on Node A and Node B and connects to RabbitMQ
Client A connects to Node A and subscribe to Queue named XXX Client
Client B connects to Node B and subscribe to Queue named XXX
Client A sendsmessage to websocket server Websocket Server sends message to Node A
Node A publishes messages to RabbitMQ queue XXX
Node B receives the message from RabbitMQ as it is subscribed to queue XXX
Node B sends message to Client B or publishes the messages to all connected clients on node B
So, all you need is to put Messaging queue in your architecture (RabbitMQ, ZeroMQ) etc
There is a library which allows easily scale WebSocket across node.js processes and machines, you can check out it:
https://github.com/ClusterWS/ClusterWS
When we speak of scalability we expect or want to hear the words linear performance gains. To be honest though this is not the case most setups as their reliance on another server/service is too great and thus bottle-necks form up within the network you're trying to host for users.
As we explore options we hear things like Databases, Message Queues, and Brokers; These are fine to use but as mentioned above if reliance on any of them is far too great you will destroy your setup in sure time.
Design the WSS Server to act solo (unless requirements are exceeded). You determine and set limits and let API server know this. So if I have 10 chat-rooms and they hold maximum 100 users and benching my WSS server proved I could hold 400-500 of them. With that information I'd set 4-5 rooms per server. So if two people enter room#1 they are on WSS server#1; If all 10 chat-rooms are full then WSS server #2 is now full and 11th room will need a WSS Server#3 up to 15th room.
The slowest part of the network would now just be your API server handling requests but this may include database as well.
If your requirements are for more users than the example, you can increase core power first or add a second server with help of an MQ or Redis Pub/Sub type setup.
Unfortunately there's no way to properly sort users, so if 3 rooms had 20 users and all were sitting on WSS server#1 that'd still leave a room left with hundreds of user slots available but is this really a problem?
It's possible this room could fill right up so leave them the spot, but still could be days till they max so programming something spicy for your needs will improve how cost effective you make it.

Amazon SQS to funnel database writes

Assume I am building netflix and I want to log each view by the userID and the movie ID
The format would be viewID , userID, timestamp,
However in order to scale this, assume were getting 1000 views a second. Would it make sense to queue these views to SQS and then our queue readers can un-queue one by one and write it to the mysql database. This way the database is not overloaded with write requests.
Does this look like something that would work?
Faisal,
This is a reasonable architecture; however, you should know that writing to SQS is going to be many times slower than writing to something like RabbitMQ (or any local) message queue.
By default, SQS FIFO queues support up to 3,000 messages per second with batching, or up to 300 messages per second (300 send, receive, or delete operations per second) without batching. To request a limit increase, you need to file a support request.
That being said, starting with SQS wouldn't be a bad idea since it is easy to use and debug.
Additionally, you may want to investigate MongoDB for logging...check out the following references:
MongoDB is Fantastic for Logging
http://blog.mongodb.org/post/172254834/mongodb-is-fantastic-for-logging
Capped Collections
http://blog.mongodb.org/post/116405435/capped-collections
Using MongoDB for Real-time Analytics
http://blog.mongodb.org/post/171353301/using-mongodb-for-real-time-analytics

How to retract a message in RabbitMQ?

I've got something like a job queue over RabbitMQ and, upon a request to cancel a job, I'd like to retract the tasks that have not yet started processing (their messages have not been ack'd), which corresponds to retracting these messages from the queues that they've been routed to.
I haven't found this functionality in AMQP or in the RabbitMQ API; perhaps I haven't searched well enough? Or will I have to use a workaround (it's not hard, but still)?
I would solve this scenario by having the worker check some sort of authoritative data source to determine if the the job should proceed or not. For example, the worker would check the job's status in a database to see if the job was canceled already.
For scenarios where the speed of processing jobs may be faster than the speed with which the authoritative store can be updated and read, a less guaranteed data store that trades speed for other characteristics may be useful.
An example of this would be to use Redis as the store for canceling processing of a message instead of a relational DB like MySQL. Redis is very fast, but makes fewer guarantees regarding the data it holds, whereas MySQL is much slower, but offers more guarantees about the data it holds.
In the end, the concept of checking with another source for whether or not to process a message is the same, but the way you implement that depends on your particular scenario.
RabbitMQ doesn't let you modify or delete messages after they've been enqueued. For that, you want some kind of database to hold the state of each job, and to use RabbitMQ to notify interested parties of changes in that state.
For lowish volumes, you can kludge it together with a queue per job. Create the queue, post the job description to the queue, announce the name of the queue to the workers. If the job needs to be cancelled before it is processed, deleted the job's queue; when the workers come to fetch the job description, they'll notice the queue has vanished.
Lighterweight and generally better would be to use redis or another key/value store to hold the job state (with a deleted or absent record meaning a cancelled or nonexistent job) and to use rabbitmq to notify about new/removed/changed records in the key/value store.
At least two ways to achieve your target:
basic.reject will requeue message if requeue=true is set (otherwise it will reject message).
(supported since RabbitMQ 2.0.0; see http://www.rabbitmq.com/blog/2010/08/03/well-ill-let-you-go-basicreject-in-rabbitmq/).
basic.recover will ask broker to redeliver unacked messages on channel.
You need to subscribe to all the queues to which messages have been routed, and consume them with ack.
For instance if you publish to a topic exchange with "test" as the routing key, and there are 3 persistent queues which subscribe to "test" you would need to consume those three queues. It might be better to add another queue which your consumer processes would also listen too, and tell them to ignore those messages.
An alternative, since you are using RabbitMQ, is to write a custom exchange plugin that will accept some out of band instruction to clear all queues. For instance you might have that exchange read a special message header that tells it to clear all queues to which this message is destined. This does require writing Erlang code, but there are 4 different exchange types implemented so you would only need to copy the most similar one and write the code for the new bahaviours. If you only use custom headers for this, then the body of the message can be a normal message for the consumers.
To sum up:
1) the publisher needs to consume the messages itself
2) the publisher can send a special message in a special queue to tell consumers to ignore the message
3) the publisher can send a special message to a custom exchange that will clear any existing messages from the queues before sending this special message to consumers.