Locks and batch fetch messages with RabbitMq - message-queue

I'm trying to use RabbitMq in a more unconventional way (though at this point i can pick any other message queue implementation if needed). Instead of leaving Rabbit push messages to my consumers, the consumer connects to a queue and fetches a batch of N messages (during which it consumes some and possible rejects some), after which it jumps to another queue and so on. This is done for redundancy. If some consumers crash all messages are guaranteed to be consumed by some other consumer.
The problem is that I have multiple consumers and I don't want them to compete over the same queue. Is there a way to guarantee a lock on a queue? If not, can I at least make sure that if 2 consumers are connected to the same queue they don't read the same message? Transactions might help me to some degree but I've heard talk that they'll get removed from RabbitMQ.
Other architectural suggestions are welcomed too.
Thanks!
EDIT:
As pointed in the comment there's an a particularity in how I need to process the messages. They only make sense taken in groups and there's a high probability that related messages are clumped together in a queue. If for example I pull a batch of 100 messages, there's a high probability that I'll be able to do something with messages 1-3, 4-5,6-10 etc. If I fail to find a group for some messages I'll resubmit them to the queue. WorkQueue wouldn't work because it would spread messages from the same group to multiple workers that wouldn't know what to do with them.

Have you had a look at this free online book on Enterprise Integration Patterns?
It sounds like you really need a workflow where you have a batcher component before the messages get to your workers. With RabbitMQ there are two ways to do that. Either use an exchange type (and message format) that can do the batching for you, or have one queue, and a worker that sorts out batches and places each batch on its own queue. The batcher should probably also send a "batch ready" message to a control queue so that a worker can discover the existence of the new batch queue. Once the batch is processed the worker could delete the batch queue.
If you have control over the message format, you might be able to get RabbitMQ to do the batching implicitly in a couple of ways. With a topic exchange, you could make sure that the routing key on each message is of the format work.batchid.something and then a worker that learns of the existence of batch xxyzz would use a binding key like #.xxyzz.# to only consume those messages. No republishing needed.
The other way is to include a batch id in a header and use the newer headers exchange type. Of course you can also implement your own custom exchange types if you are willing to write a small amount of Erlang code.
I do recommend checking the book though, because it gives a better overview of messaging architecture than the typical worker queue concept that most people start with.

Have your consumers pull from just one queue. They will be guaranteed not to share messages (Rabbit will round-robin the messages among the currently-connected consumers) and it's heavily optimized for that exact usage pattern.
It's ready-to-use, out of the box. In the RabbitMQ docs it's called the Work Queue model. One queue, multiple consumers, with none of them sharing anything. It sounds like what you need.

You can set a channel/consumer level prefetch count to consume messages in batches. In order to re-submit messages, you should use the basic.reject AMQP method and those messages can be chosen to be requeued or forwarded to a dead letter queue. Multiple consumers trying to pull messages from the same queue is not an issue asthe AMQP basic.get method will be synchronized to handle concurrent consumers.
https://groups.google.com/forum/#!topic/rabbitmq-users/hJ8f5du-GCA

Related

How do multiple developers use the same queue for development?

We use SQS for queueing use-cases in our company. All developers connect to the same queue for local development. If we're producing some messages for testing in local development, it can happen that the message is consumed on other person's locally running consumer, if that person has the app running at the same time.
How do you make sure that messages produced by one person don't end up getting lost by consumption on other person's locally running consumer. Is using different different queues for each person the only solution? Wondering what is standard followed to avoid this in the industry?
This is very open-ended IMO. Would recommend adding some context as to how you're using SQS.
But from what I could understand:
Yes, I would recommend creating queues per "developer"
OR
Although not elegant, you can maybe add an SQS message attribute (this is metadata other than message body) with a developer's username.
And each developer should then only process a message if it's meant for them. Arguably, you could also add a flag in the message itself, but, I am not sure about the constraints on your message format. Message attributes are meant to be used for these situations, where you want to know if you really need to process a message before even parsing the message body.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-message-metadata.html#sqs-message-attributes
But you'll have to increase the maxReceives to a high number (so that message does not move to dead letter queue, if you have configured one). This is not exhaustive, it will just decrease the chances of your messages being deleted by someone else. Because if say, 10 people read the message and did not delete it because username was not their username, and maxReceives is 8, it will still move to DLQ and cause unnecessary confusion.

Beanstalkd to ZeroMQ: is it possible to distribute work in the same way?

A common beanstalkd workflow would be to have many workers listening for jobs on a queue/tube, locking that job while they process it, and deleting that job so that no other workers can re-process it. If the job fails (eg. resources are unavailable to complete processing) the job can slip back onto the queue for another worker to pick up the job.
Is this approach possible with ZeroMQ? Eg, using the pub/sub model can multiple subscribers receive the same job and process it at the same time? Would push/pull or req/rep provide a similar setup?
I'm certain ZeroMQ can provide this for you. However keep in mind that ZeroMQ is not really a queue. It's an advanced networking library. Naturally, with the provided primatives, you can do what you describe.
You specific case seems like it could be implemented as a pub/sub system, if you don't mind having the same work done many times over. I recommend reading the ZeroMQ guide and especially chapter 5.
Although I'm certain you can do what you describe with ZeroMQ, I would first search for a queue which does this already.

Message queuing solution for millions of topics

I'm thinking about system that will notify multiple consumers about events happening to a population of objects. Every subscriber should be able to subscribe to events happening to zero or more of the objects, multiple subscribers should be able to receive information about events happening to a single object.
I think that some message queuing system will be appropriate in this case but I'm not sure how to handle the fact that I'll have millions of the objects - using separate topic for every of the objects does not sound good [or is it just fine?].
Can you please suggest approach I should should take and maybe even some open source message queuing system that would be reasonable?
Few more details:
there will be thousands of subscribers [meaning not plenty of them],
subscribers will subscribe to tens or hundreds of objects each,
there will be ~5-20 million of the objects,
events themselves dont have to carry any message. just information that that object was changed is enough,
vast majority of objects will never be subscribed to,
events occur at the maximum rate of few hundreds per second,
ideally the server should run under linux, be able to integrate with the rest of the ecosystem via http long-poll [using node js? continuations under jetty?].
Thanks in advance for your feedback and sorry for somewhat vague question!
I can highly recommend RabbitMQ. I have used it in a couple of projects before and from my experience, I think it is very reliable and offers a wide range of configuraions. Basically, RabbitMQ is an open-source ( Mozilla Public License (MPL) ) message broker that implements the Advanced Message Queuing Protocol (AMQP) standard.
As documented on the RabbitMQ web-site:
RabbitMQ can potentially run on any platform that Erlang supports, from embedded systems to multi-core clusters and cloud-based servers.
... meaning that an operating system like Linux is supported.
There is a library for node.js here: https://github.com/squaremo/rabbit.js
It comes with an HTTP based API for management and monitoring of the RabbitMQ server - including a command-line tool and a browser-based user-interface as well - see: http://www.rabbitmq.com/management.html.
In the projects I have been working with, I have communicated with RabbitMQ using C# and two different wrappers, EasyNetQ and Burrow.NET. Both are excellent wrappers for RabbitMQ but I ended up being most fan of Burrow.NET as it is easier and more obvious to work with ( doesn't do a lot of magic under the hood ) and provides good flexibility to inject loggers, serializers, etc.
I have never worked with the amount of amount of objects that you are going to work with - I have worked with thousands ( not millions ). However, no matter how many objects I have been playing around with, RabbitMQ has always worked really stable and has never been the source to errors in the system.
So to sum up - RabbitMQ is simple to use and setup, supports AMQP, can be managed via HTTP and what I like the most - it's rock solid.
Break up the topics to carry specific events for e.g. "Object updated topic" "Object deleted"...So clients need to only have to subscribe to the "finite no:" of event based topics they are interested in.
Inject headers into your messages when you publish them and put intelligence into the clients to use these headers as message selectors. For eg, client knows the list of objects he is interested in - and say you identify the object by an "id" - the id can be the header, and the client will use the "id header" to determine if he is interested in the message.
Depending on whether you want, you may also want to consider ensuring guaranteed delivery to make sure that the client will receive the message even if it goes off-line and comes back later.
The options that I would recommend top of the head are ActiveMQ, RabbitMQ and Redis PUB SUB ( Havent really worked on redis pub-sub, please use your due diligance)
Finally here are some performance benchmarks for RabbitMQ and Redis
Just saw that you only have few 100 messages getting pushed out / sec, this is not a big deal for activemq, I have been using Amq on a system that processes 240 messages per second , and it just works fine. I use a thread pool of workers to asynchronously process the messages though . Look at a framework like akka if you are in the java land, if not stick with nodejs and the cool Eco system around it.
If it has to be open source i'd go for ActiveMQ, and an application server to provide the JMS functionality for topics and it has Ajax Support so you can access them from your client
So, you would use the JMS infrastructure to publish the topics for the objects, and you can create topis as you need them
Besides, by using an java application server you may be able to take advantages from clustering, load balancing and other high availability features (obviously based on the selected product)
Hope that helps!!!
Since your messages are very small might want to consider MQTT, which is designed for small devices, although it works fine on powerful devices as well. Key consideration is the low overhead - basically a 2 byte header for a small message. You probably can't use any simple or open source MQTT server, due to your volume. You probably need a heavy duty dedicated appliance like a MessageSight to handle your volume.
Some more details on your application would certainly help. Also you don't mention security at all. I assume you must have some needs in this area.
Though not sure about your work environment but here are my bits. Can you identify each object with unique ID in your system. If so, you can have a topic per each event type. for e.g. you want to track object deletion event, object updation event and so on. So you can have topic for each event type. These topics would be published with Ids of object whenever corresponding event happened to the object. This will limit the no of topics you needed.
Second part of your problem is different subscribers want to subscribe to different objects. So not all subscribers are interested in knowing events of all objects. This problem statement scoped to message selector(filtering) mechanism provided by messaging framework. So basically you need to seek on what basis a subscriber interested in particular object. Have that basis as a message filtering mechanism. It could be anything: object type, object state etc. So ultimately your system would consists of one topic for each event type with someone publishing event messages : {object-type:object-id} information. Subscribers could subscribe to any topic and with an filtering criteria.
If above solution satisfy, you can use any messaging solution: activeMQ, WMQ, RabbitMQ.

Which message queue can handle private queues that survive subscriber disconnects?

I have some requirements for a system in need of a message queue:
The subscribers shall get individual queues.
The individual queues shall NOT be deleted when the subscriber disconnects
The subscriber shall be able to reconnect to its own queue if it looses connection
Only the subscriber shall be able to use the queue assigned to it
Nice to have: the queues survive a server restart
Can RabbitMQ be used to implement this, and in that case how?
I have only recently started using Rabbit but I believe your requirements can be addressed fairly easily.
1) I have implemented specific queues for individual subscribers by having the subscriber declare the queue (and related routing key) using its machine name as part of the queue name. The exchange takes care of routing messages appropriately by way of the binding/routing keys. In my case, all subscribers get a copy of the same message posted by the publisher and an arbitrary number of subscribers can declare their own queues and start receiving messages.
2) That's pretty much standard. If you declare a queue then it will remain in the exchange, and if it is set as durable then it will survive broker restarts. In any case, your subscriber should call queue.Declare() at startup to ensure that the queue exists but in terms of the subscriber disconnecting, the queue will remain.
3) If the queue is there and a subscriber is listening to that queue by name then there's no reason why it shouldn't be able to reconnect.
4) I haven't really delved in to the security aspects of Rabbit yet. There may be a means of securing individual queues though I'll let someone else comment on this as I'm no authority.
5) See (2). Messages will also survive a restart if set as durable as they are then written to disk. This incurs a performance penalty as there's disk I/O but that's kind of what you'd expect.
So basically, yes. Rabbit can do as you ask. In terms of 'how', there are varying degrees of 'how'. Will happily try to provide you with code-level answers should you have trouble implementing any of the above. In the meantime, and if you haven't already done so, I suggest reading through the docs:
http://www.rabbitmq.com/documentation.html
HTH. Steve

Does RabbitMq do round-robin from the exchange to the queues

I am currently evaluating message queue systems and RabbitMq seems like a good candidate, so I'm digging a little more into it.
To give a little context I'm looking to have something like one exchange load balancing the message publishing to multiple queues. I don't want to replicate the messages, so a fanout exchange is not an option.
Also the reason I'm thinking of having multiple queues vs one queue handling the round-robin w/ the consumers, is that I don't want our single point of failure to be at the queue level.
Sounds like I could add some logic on the publisher side to simulate that behavior by editing the routing key and having the appropriate bindings in place. But that's kind of a passive approach that wouldn't take the pace of the message consumption on each queue into account, potentially leading to fill up one queue if the consumer applications for that queue are dead.
I was looking for a more pro-active way from the exchange entity side, that would decide where to send the next message based on each queue size or something of that nature.
I read about Alice and the available RESTful APIs but that seems kind of a heavy duty solution to implement fast routing decisions.
Anyone knows if round-robin between the exchange the queues is feasible w/ RabbitMQ then? Thanks.
Exchanges are generally stateless in the AMQP model, though there have been some recent experiments in stateful exchanges now that there's both a system for managing RabbitMQ plugins and for providing new experimental exchange types.
There's nothing that does quite what you want, I don't think, though I'm not completely sure I understand the requirement. Aside from the single-point-of-failure point, would having a single queue with workers reading from it solve your problem? If so, then your problem reduces to configuring RabbitMQ in an HA configuration that permits you to use that solution. There are a couple of approaches to doing that: either use HALinux and a shared store to get active/passive HA with quick failover, or set up more than one parallel broker and deduplicate on the client, perhaps using redis or similar to do so.
I suggest asking your question again on the rabbitmq-discuss mailing list, where more people will be able to offer suggestions, and where the discussion can be archived for posterity.
Agree with Tony on the approach.
Here is a 'mashup' of RabbitMQ, Redis that you could use instead of rolling your own -
http://xing.github.com/beetle/
One built in way you can do a form of sharing a form exchange to queues, but not exactly round robin, is Consistent Hashing. rabbitmq_consistent_hash_exchange
How too
https://medium.com/#eranda/rabbitmq-x-consistent-hashing-with-wso2-esb-27479b8d1d21
Paper to explain, it puts queues at a weighted distribution on a circle and then by sending random routing key it will send to the closest queue.
http://www8.org/w8-papers/2a-webserver/caching/paper2.html