How to retract a message in RabbitMQ? - message-queue

I've got something like a job queue over RabbitMQ and, upon a request to cancel a job, I'd like to retract the tasks that have not yet started processing (their messages have not been ack'd), which corresponds to retracting these messages from the queues that they've been routed to.
I haven't found this functionality in AMQP or in the RabbitMQ API; perhaps I haven't searched well enough? Or will I have to use a workaround (it's not hard, but still)?

I would solve this scenario by having the worker check some sort of authoritative data source to determine if the the job should proceed or not. For example, the worker would check the job's status in a database to see if the job was canceled already.
For scenarios where the speed of processing jobs may be faster than the speed with which the authoritative store can be updated and read, a less guaranteed data store that trades speed for other characteristics may be useful.
An example of this would be to use Redis as the store for canceling processing of a message instead of a relational DB like MySQL. Redis is very fast, but makes fewer guarantees regarding the data it holds, whereas MySQL is much slower, but offers more guarantees about the data it holds.
In the end, the concept of checking with another source for whether or not to process a message is the same, but the way you implement that depends on your particular scenario.

RabbitMQ doesn't let you modify or delete messages after they've been enqueued. For that, you want some kind of database to hold the state of each job, and to use RabbitMQ to notify interested parties of changes in that state.
For lowish volumes, you can kludge it together with a queue per job. Create the queue, post the job description to the queue, announce the name of the queue to the workers. If the job needs to be cancelled before it is processed, deleted the job's queue; when the workers come to fetch the job description, they'll notice the queue has vanished.
Lighterweight and generally better would be to use redis or another key/value store to hold the job state (with a deleted or absent record meaning a cancelled or nonexistent job) and to use rabbitmq to notify about new/removed/changed records in the key/value store.

At least two ways to achieve your target:
basic.reject will requeue message if requeue=true is set (otherwise it will reject message).
(supported since RabbitMQ 2.0.0; see http://www.rabbitmq.com/blog/2010/08/03/well-ill-let-you-go-basicreject-in-rabbitmq/).
basic.recover will ask broker to redeliver unacked messages on channel.

You need to subscribe to all the queues to which messages have been routed, and consume them with ack.
For instance if you publish to a topic exchange with "test" as the routing key, and there are 3 persistent queues which subscribe to "test" you would need to consume those three queues. It might be better to add another queue which your consumer processes would also listen too, and tell them to ignore those messages.
An alternative, since you are using RabbitMQ, is to write a custom exchange plugin that will accept some out of band instruction to clear all queues. For instance you might have that exchange read a special message header that tells it to clear all queues to which this message is destined. This does require writing Erlang code, but there are 4 different exchange types implemented so you would only need to copy the most similar one and write the code for the new bahaviours. If you only use custom headers for this, then the body of the message can be a normal message for the consumers.
To sum up:
1) the publisher needs to consume the messages itself
2) the publisher can send a special message in a special queue to tell consumers to ignore the message
3) the publisher can send a special message to a custom exchange that will clear any existing messages from the queues before sending this special message to consumers.

Related

Azure Service Bus: How to keep the message send from sender is FIFO

I have read a few questions from StackOverflow. They said we can enabled the Session Support to the queue to keep the message FIFO. Some mention the ordering cannot be guaranteed. To make sure the message processed in order we have to deal with manual during the processing by the timestamp.
Is that true?
Azure Service Bus Queue itself follows FIFO. In some cases, the processing of the messages may not be sequential. If you are sure that the size of the payload will be consistent, then you can go with the normal Queues, which will process the messages in order(works for me).
If there will be change in payload size between the messages, it is preferred to go with Session enabled Queues as Sean Feldman mentioned in his answer.
To send/receive messages in FIFO mode, you need to enable enable "Require Sessions" on the queue and use Message Sessions to send/receive messages. The timestamp doesn't matter. What matters is the session.
Upon sending, setting message's SessionId
Upon receiving, either receive any session using MessageReceiver or a specific session using lower level API (SessionClient) and specifying session ID.
A good start would be to read the documentation and have a look at this sample.

SSIS - Script Component pulling information from RabbitMQ

A question that might be mostly theoretical, but I'd love to have my concerns put to rest (or confirmed).
I built a Script Component to pull data from RabbitMQ. On RabbitMQ, we basically set up a durable queue. This means messages will continue to be added to the queue, even when the server reboots. This construction allows us to periodically execute the package and grab all "new" messages since the last time we did so.
(We know RabbitMQ isn't set up to accommodate to this kind of scenario, but rather it expects there to be a constant listener to process messages. However, we are not comfortable having some task start when SQL Server starts, and pretty much running 24/7 to handle that, so we built something we can schedule to run every n minutes and empty the queue that way. If we'd not be able to run the task, we most likely are dealing with a failed SQL Server, and have different priorities).
The component sets up a connection, and then connects to the specific exchange + queue we are pulling messages from. Messages are in JSON format, so we deserialize the message into a class we defined in the script component.
For every message found, we disable auto-acknowledge, so we can process it and only acknowledge it once we're done with it (which ensures the message will be processed, and doesn't slip through). Then we de-serialize the message and push it onto the output buffer of the script component.
There's a few places things can go wrong, so we built a bunch of Try/Catch blocks in the code. However, seeing we're dealing with the queue aspect, and we need the information available to us, I'm wondering if someone can explain how/when a message that is sent to the output buffer is processed.
Is it batched up and then pushed? Is it sent straight away, and is the SSIS component perhaps not updating information back to SSIS in a timely fashion?
Would there be a chance for us to acknowledge a message, but that it somehow ends up not getting committed to our database, yet popped from the queue (as I think happens once a message is acknowledged)?

Push Data onto Queue vs Pull Data by Workers

I am building a web site backend that involves a client submitting a request to perform some expensive (in time) operation. The expensive operation also involves gathering some set of information for it to complete.
The work that the client submits can be fully described by a uuid. I am hoping to use a service oriented architecture (SOA) (i.e. multiple micro-services).
The client communicates with the backend using RESTful communication over HTTP. I plan to use a queue that the workers performing the expensive operation can poll for work. The queue has persistence and offers decent reliability semantics.
One consideration is whether I gather all of the data needed for the expensive operation upstream and then enqueue all of that data or whether I just enqueue the uuid and let the worker fetch the data.
Here are diagrams of the two architectures under consideration:
Push-based (i.e. gather data upstream):
Pull-based (i.e. worker gathers the data):
Some things that I have thought of:
In the push-based case, I would be likely be blocking while I gathered the needed data so the client's HTTP request would not be responded to until the data is gathered and then enqueued. From a UI standpoint, the request would be pending until the response comes back.
In the pull based scenario, only the worker needs to know what data is required for the work. That means I can have multiple types of clients talking to various backends. If the data needs change I update just the workers and not each of the upstream services.
Any thing else that I am missing here?
Another benefit of the pull based approach is that you don't have to worry about the data getting stale in the queue.
I think you already pretty much explained that the second (pull-based) approach is better.
If a user's request should anyway be processed asynchronously, why wait for the data to be gathered and then return a response. You need just to queue a work item and return HTTP response.
Passing data via queue is not a good option. If you get the data upstream, you will have to pass it somehow other than via queue to the worker (usually BLOB storage). That is additional work that is not really needed in your case.
I would recommend Cadence Workflow instead of queues as it supports long running operations and state management out of the box.
Cadence offers a lot of other advantages over using queues for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
See the presentation that goes over Cadence programming model.

How to write an event trigger which send alerts to a JMS Queue

Is there any example where, we can trigger an event to send messages to JMS Queue when a table is updated/inserted ect for MYSQL/Postgre?
This sounds like a good task for pg_message_queue (which you can get off Google Code or PGXN), which allows you to queue requests. pg_message_queue doesn't do a great job of parallelism yet (in terms of parallel queue consumers), but I don't think you need that.
What you really want to do (and what pg_message_queue provides) is a queue table to hold the jms message, and then a trigger to queue that message. Then the question is how you get it from there to jms. You have basically two options (both of which are supported):
LISTEN for notifications, and when those come in handle them.
Periodically poll for notifications. You might do this if you have a lot of notifications coming in, so you can batch them every minute or so, or if you have few notifications coming in and you want to process them at midnight.
Naturally that is PostgreSQL only. Doing the same on MySQL? I don't know how to do that. I think you would be stuck with polling the table, but you could use pg_message_queue to understand basically how to do the rest. Note that in all cases this is fully transactional so the message would not be sent until after transaction commit, which is probably what you want.

Implementing message priority in AMQP

I'm intending to use AMQP to allow a distributed collection of machines to report to a central location asynchronously. The idea is to drop messages into the queue and allow the central logging entity to process the queue in a decoupled fashion; the 'process' is simply to create or update a row in a database table.
A problem that I'm anticipating is the effect of network jitter in the message queuing process - what happens if an update accidentally gets in front of an insert because the time between the two messages being issued is less than the network jitter?
Reading the AMQP spec, it seems that I could just apply a higher priority to inserts so they skip the queue and get processed first. But presumably this only applies if a queue actually exists at the broker to be skipped. Is there a way to impose a buffer or delay at the broker to absorb this jitter and allow priority to be enacted before the messages are passed on to the consumer(s)?
Or do I have to go down the route of a resequencer as ActiveMQ suggests?
The lack of ordering between multiple publishers has nothing to do with network jitter, it's a completely natural thing in distributed applications. Messages from the same publisher will always be ordered. If you really need causal ordering of actions performed by different nodes then either a resequencer or a global sequence numbering scheme are your only options. Note that you cannot use sender timestamps for this, which is what everyone seems to try first..