Azure Service Bus: How to keep the message send from sender is FIFO - message-queue

I have read a few questions from StackOverflow. They said we can enabled the Session Support to the queue to keep the message FIFO. Some mention the ordering cannot be guaranteed. To make sure the message processed in order we have to deal with manual during the processing by the timestamp.
Is that true?

Azure Service Bus Queue itself follows FIFO. In some cases, the processing of the messages may not be sequential. If you are sure that the size of the payload will be consistent, then you can go with the normal Queues, which will process the messages in order(works for me).
If there will be change in payload size between the messages, it is preferred to go with Session enabled Queues as Sean Feldman mentioned in his answer.

To send/receive messages in FIFO mode, you need to enable enable "Require Sessions" on the queue and use Message Sessions to send/receive messages. The timestamp doesn't matter. What matters is the session.
Upon sending, setting message's SessionId
Upon receiving, either receive any session using MessageReceiver or a specific session using lower level API (SessionClient) and specifying session ID.
A good start would be to read the documentation and have a look at this sample.

Related

Push Data onto Queue vs Pull Data by Workers

I am building a web site backend that involves a client submitting a request to perform some expensive (in time) operation. The expensive operation also involves gathering some set of information for it to complete.
The work that the client submits can be fully described by a uuid. I am hoping to use a service oriented architecture (SOA) (i.e. multiple micro-services).
The client communicates with the backend using RESTful communication over HTTP. I plan to use a queue that the workers performing the expensive operation can poll for work. The queue has persistence and offers decent reliability semantics.
One consideration is whether I gather all of the data needed for the expensive operation upstream and then enqueue all of that data or whether I just enqueue the uuid and let the worker fetch the data.
Here are diagrams of the two architectures under consideration:
Push-based (i.e. gather data upstream):
Pull-based (i.e. worker gathers the data):
Some things that I have thought of:
In the push-based case, I would be likely be blocking while I gathered the needed data so the client's HTTP request would not be responded to until the data is gathered and then enqueued. From a UI standpoint, the request would be pending until the response comes back.
In the pull based scenario, only the worker needs to know what data is required for the work. That means I can have multiple types of clients talking to various backends. If the data needs change I update just the workers and not each of the upstream services.
Any thing else that I am missing here?
Another benefit of the pull based approach is that you don't have to worry about the data getting stale in the queue.
I think you already pretty much explained that the second (pull-based) approach is better.
If a user's request should anyway be processed asynchronously, why wait for the data to be gathered and then return a response. You need just to queue a work item and return HTTP response.
Passing data via queue is not a good option. If you get the data upstream, you will have to pass it somehow other than via queue to the worker (usually BLOB storage). That is additional work that is not really needed in your case.
I would recommend Cadence Workflow instead of queues as it supports long running operations and state management out of the box.
Cadence offers a lot of other advantages over using queues for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
See the presentation that goes over Cadence programming model.

Should a message queue server be facing the Internet directly or not?

I have the following use case:
message size: ~4kb
protocol type: considering MQTT
message queue server: considering RabbitMQ or Mosquitto
up to 50k msg / s arriving messages
each message is sent from a mobile client with various network connectivity
What I would like to know is: how is it better to have the system to ingest the messages?
A) expose the message queue server directly to the Internet, processes the messages later for consistency / validity (of course with a load balancer in front of the servers)
B) expose a server that can read the message in the native format, apply some basic validity checks and then queue the message to an internal message queue server
I'm leaning towards the second option but I have no real arguments for pro / cons of it vs first option so can you please advise on this one?
Thank you.
You question has two parts:
Whether or not to expose the message queue server to the internet
Whether or not to process the message immediately
For the first question, I would advice to put the server behind a firewall. As such, you will have more tools to protect your server against internet attacks.
For the second question, it depends on whether or not the server is required to inform the mobile about the message processing result and whether the result of the message processing should be known immediately:
In case you are not required to send a feedback to the mobile and the result of the message processing is not required to be executed immediately, I would advice to log the message then process later it in batch mode,
In case you are required to send back a feedback to the mobile but the message isn't required to be processed immediately, I would advice to execute a sanity check of the message, send back the feedback to the mobile then log the message for batch processing,
Otherwise, I would advice to execute the sanity check, process the message and send back feedback to the mobile.
In my advice, I have suggested to use batch mode over online mode as much as possible. When you operate in batch mode, you have more options to use efficiently your computing resources in a simple way.

Acknowledgment from Consumer in ActiveMQ

I want to acknowledge messages after they have been processed by some processing engine like yahoo! S4. I can only send the messages to the engine using a Adapter.
Currently I am storing each message in the adapter, till the processing engine returns me the JMScorrelationID of the message, telling that the message has been processed. But this takes up lot of space on the Adapter for storing each message.
So is there any way by which we can manually create the acknowledgement using the JMScorrelationID and send it to the broker??
No, this sort of thing is not supported by the JMS spec. You can use transactions and commit when your adapter has finished its work.

Implementing message priority in AMQP

I'm intending to use AMQP to allow a distributed collection of machines to report to a central location asynchronously. The idea is to drop messages into the queue and allow the central logging entity to process the queue in a decoupled fashion; the 'process' is simply to create or update a row in a database table.
A problem that I'm anticipating is the effect of network jitter in the message queuing process - what happens if an update accidentally gets in front of an insert because the time between the two messages being issued is less than the network jitter?
Reading the AMQP spec, it seems that I could just apply a higher priority to inserts so they skip the queue and get processed first. But presumably this only applies if a queue actually exists at the broker to be skipped. Is there a way to impose a buffer or delay at the broker to absorb this jitter and allow priority to be enacted before the messages are passed on to the consumer(s)?
Or do I have to go down the route of a resequencer as ActiveMQ suggests?
The lack of ordering between multiple publishers has nothing to do with network jitter, it's a completely natural thing in distributed applications. Messages from the same publisher will always be ordered. If you really need causal ordering of actions performed by different nodes then either a resequencer or a global sequence numbering scheme are your only options. Note that you cannot use sender timestamps for this, which is what everyone seems to try first..

How to retract a message in RabbitMQ?

I've got something like a job queue over RabbitMQ and, upon a request to cancel a job, I'd like to retract the tasks that have not yet started processing (their messages have not been ack'd), which corresponds to retracting these messages from the queues that they've been routed to.
I haven't found this functionality in AMQP or in the RabbitMQ API; perhaps I haven't searched well enough? Or will I have to use a workaround (it's not hard, but still)?
I would solve this scenario by having the worker check some sort of authoritative data source to determine if the the job should proceed or not. For example, the worker would check the job's status in a database to see if the job was canceled already.
For scenarios where the speed of processing jobs may be faster than the speed with which the authoritative store can be updated and read, a less guaranteed data store that trades speed for other characteristics may be useful.
An example of this would be to use Redis as the store for canceling processing of a message instead of a relational DB like MySQL. Redis is very fast, but makes fewer guarantees regarding the data it holds, whereas MySQL is much slower, but offers more guarantees about the data it holds.
In the end, the concept of checking with another source for whether or not to process a message is the same, but the way you implement that depends on your particular scenario.
RabbitMQ doesn't let you modify or delete messages after they've been enqueued. For that, you want some kind of database to hold the state of each job, and to use RabbitMQ to notify interested parties of changes in that state.
For lowish volumes, you can kludge it together with a queue per job. Create the queue, post the job description to the queue, announce the name of the queue to the workers. If the job needs to be cancelled before it is processed, deleted the job's queue; when the workers come to fetch the job description, they'll notice the queue has vanished.
Lighterweight and generally better would be to use redis or another key/value store to hold the job state (with a deleted or absent record meaning a cancelled or nonexistent job) and to use rabbitmq to notify about new/removed/changed records in the key/value store.
At least two ways to achieve your target:
basic.reject will requeue message if requeue=true is set (otherwise it will reject message).
(supported since RabbitMQ 2.0.0; see http://www.rabbitmq.com/blog/2010/08/03/well-ill-let-you-go-basicreject-in-rabbitmq/).
basic.recover will ask broker to redeliver unacked messages on channel.
You need to subscribe to all the queues to which messages have been routed, and consume them with ack.
For instance if you publish to a topic exchange with "test" as the routing key, and there are 3 persistent queues which subscribe to "test" you would need to consume those three queues. It might be better to add another queue which your consumer processes would also listen too, and tell them to ignore those messages.
An alternative, since you are using RabbitMQ, is to write a custom exchange plugin that will accept some out of band instruction to clear all queues. For instance you might have that exchange read a special message header that tells it to clear all queues to which this message is destined. This does require writing Erlang code, but there are 4 different exchange types implemented so you would only need to copy the most similar one and write the code for the new bahaviours. If you only use custom headers for this, then the body of the message can be a normal message for the consumers.
To sum up:
1) the publisher needs to consume the messages itself
2) the publisher can send a special message in a special queue to tell consumers to ignore the message
3) the publisher can send a special message to a custom exchange that will clear any existing messages from the queues before sending this special message to consumers.