How does Amazon SQS takes care of not sending same message to different instances of same service? - message-queue

I have a queue ( in this case Amazon SQS ) and there are N nodes of same service running which are consuming messages from SQS.
How can I make sure that during any point of time, not more than one nodes has read a same message from queue.
In case of Kafka, we know that, not more than one consumer from the same consumer group can be assigned to a single topic partition. How do we make sure the same thing is handled inside Amazon SQS or not ?

The Amazon mechanism to prevent that a message is delivered to multiple consumers is the Visibility Timeout:
Immediately after a message is received, it remains in the queue. To prevent other consumers from processing the message again, Amazon SQS sets a visibility timeout, a period of time during which Amazon SQS prevents other consumers from receiving and processing the message. The default visibility timeout for a message is 30 seconds. The minimum is 0 seconds. The maximum is 12 hours.
After the message is received, SQS starts the timeout and for its duration, it doesn't send it again to other consumers. After the timeout ends, if the message has not been deleted, SQS makes it available again for other consumers.
But as the note says:
For standard queues, the visibility timeout isn't a guarantee against receiving a message twice. For more information, see At-Least-Once Delivery.
If you need absolute guarantees of only once delivery, you have to option:
Design your application to be idempotent so that the result is
the same if it process the same message one or more time.
Try
using a SQS FIFO queue that provides exactly once processing.

Related

MySQL Aurora connection management with SQS and Lambda

I have a use case in my system where I need to process hundreds of user records nightly. Currently, I have a scheduled Lambda function which pulls all the users to be processed and places each onto an SQS queue. I then have another Lambda function that reads from this queue and handles the processing. Each user requires quite a lot of processing which uses quite a few connections for each user. I use a mysql transaction in as many places as I can to cut down the connections used. I'm running into issues with my Aurora MySQL database hitting the connection limit (1000 currently). I have tried playing around with the batch sizes as well as the lambda concurrency but I still seem to run into issues. Currently, the batch size is 10 and concurrency is 1. The Lambda function does not use a connection pool as I found that caused more issues with connections. Am I missing something here or is this just an issue with MySQL and Lambda scaling?
Thanks
Amazon RDS Proxy is the solution provided by AWS to prevent a large number of Lambda functions from running at the same time and overwhelming the connection limit of the database instance.
Alternatively, you could use this trick to throttle the rate of lambdas:
Create another SQS queue and fill it with a finite set of elements. Say 100 elements, for instance. The values you put into this queue don't matter. It's the quantity that is important.
Lambdas are activated by this queue.
When the lambdas are activated, they request the next value from your first SQS queue, with the users to be processed.
If there are no more users to process, i.e. if the first queue is empty, then the lambda exits without connecting to Aurora.
Each lambda invocation processes the user. When it is done, it disconnects from Aurora and then pushes a new element onto the second SQS queue as its last step, which activates another lambda.
This way there are never more than 100 lambdas running at a time. Adjust this value to however many lambdas you want to allow concurrently.

How do you process messages with deadline

If you have continuous messages generated with different deadlines, how would you process these messages in order of deadlines.
I have implemented by saving message to persistent store. Later schedule jobs processing the most recent expired messages.
If I have to implement this in pub/sub mechanism, how should my queue give priority to deadline time? Is there any queuing solution that delivers messages based on ttl or deadline?

Azure Service Bus: How to keep the message send from sender is FIFO

I have read a few questions from StackOverflow. They said we can enabled the Session Support to the queue to keep the message FIFO. Some mention the ordering cannot be guaranteed. To make sure the message processed in order we have to deal with manual during the processing by the timestamp.
Is that true?
Azure Service Bus Queue itself follows FIFO. In some cases, the processing of the messages may not be sequential. If you are sure that the size of the payload will be consistent, then you can go with the normal Queues, which will process the messages in order(works for me).
If there will be change in payload size between the messages, it is preferred to go with Session enabled Queues as Sean Feldman mentioned in his answer.
To send/receive messages in FIFO mode, you need to enable enable "Require Sessions" on the queue and use Message Sessions to send/receive messages. The timestamp doesn't matter. What matters is the session.
Upon sending, setting message's SessionId
Upon receiving, either receive any session using MessageReceiver or a specific session using lower level API (SessionClient) and specifying session ID.
A good start would be to read the documentation and have a look at this sample.

Amazon SQS Worker Tier Auto Scaling fails

We have an SQS Worker Tier app subscribed to a queue. When it is running it works fine, however, when it gets busy, and scales up, the new instance starts getting messages almost immediately, before it is actually ready. This results in 500 responses, and the messages being discarded to the dead letter queue.
We have our queue configured with a maximum attempt of 1; due to the database changes a message will make during consumption we can't just put it back in the queue in case of error.
I have tried using the monitor health url as I would with a normal web app, but this doesn't seem to work as messages continue to be sent regardless.
Is there any way of setting a delay on any new auto scaled instance before is starts receiving messages from the queue?
I am not sure how the instance is 'getting messages' before its ready, unless you are actually using SNS to PUSH the messages to the endpoint, as opposed to having the endpoint(instance) PULL the messages from the queue.
If you are pushing messages via SNS, then the easiest solution is to have the instance POLL the SQS queue for messages when its ready to process them - much safer and reliable, and obviously the instance can decide for itself when its ready to do work.
It also sounds to me like you solution is not architected properly. If accidentally processing the same message twice would cause you problems in your database, then your not using SQS in the correct manner. Work that SQS does should be idempotent - i.e. it should be able to processed more than one time without causing problems. Even when everything is running 100% correctly, on your end and at AWS, its possible that the same message will be sent more than once to your workers - you can't prevent that - and your processing needs to be able to handle that gracefully.
You can set the HTTP Connection setting (Configuration > Worker Configuration) in order to limit the number of concurrent connections to your worker. If you set it to 1, you're sure that 1 worker won't receive another request unless it has already responded.

How to retract a message in RabbitMQ?

I've got something like a job queue over RabbitMQ and, upon a request to cancel a job, I'd like to retract the tasks that have not yet started processing (their messages have not been ack'd), which corresponds to retracting these messages from the queues that they've been routed to.
I haven't found this functionality in AMQP or in the RabbitMQ API; perhaps I haven't searched well enough? Or will I have to use a workaround (it's not hard, but still)?
I would solve this scenario by having the worker check some sort of authoritative data source to determine if the the job should proceed or not. For example, the worker would check the job's status in a database to see if the job was canceled already.
For scenarios where the speed of processing jobs may be faster than the speed with which the authoritative store can be updated and read, a less guaranteed data store that trades speed for other characteristics may be useful.
An example of this would be to use Redis as the store for canceling processing of a message instead of a relational DB like MySQL. Redis is very fast, but makes fewer guarantees regarding the data it holds, whereas MySQL is much slower, but offers more guarantees about the data it holds.
In the end, the concept of checking with another source for whether or not to process a message is the same, but the way you implement that depends on your particular scenario.
RabbitMQ doesn't let you modify or delete messages after they've been enqueued. For that, you want some kind of database to hold the state of each job, and to use RabbitMQ to notify interested parties of changes in that state.
For lowish volumes, you can kludge it together with a queue per job. Create the queue, post the job description to the queue, announce the name of the queue to the workers. If the job needs to be cancelled before it is processed, deleted the job's queue; when the workers come to fetch the job description, they'll notice the queue has vanished.
Lighterweight and generally better would be to use redis or another key/value store to hold the job state (with a deleted or absent record meaning a cancelled or nonexistent job) and to use rabbitmq to notify about new/removed/changed records in the key/value store.
At least two ways to achieve your target:
basic.reject will requeue message if requeue=true is set (otherwise it will reject message).
(supported since RabbitMQ 2.0.0; see http://www.rabbitmq.com/blog/2010/08/03/well-ill-let-you-go-basicreject-in-rabbitmq/).
basic.recover will ask broker to redeliver unacked messages on channel.
You need to subscribe to all the queues to which messages have been routed, and consume them with ack.
For instance if you publish to a topic exchange with "test" as the routing key, and there are 3 persistent queues which subscribe to "test" you would need to consume those three queues. It might be better to add another queue which your consumer processes would also listen too, and tell them to ignore those messages.
An alternative, since you are using RabbitMQ, is to write a custom exchange plugin that will accept some out of band instruction to clear all queues. For instance you might have that exchange read a special message header that tells it to clear all queues to which this message is destined. This does require writing Erlang code, but there are 4 different exchange types implemented so you would only need to copy the most similar one and write the code for the new bahaviours. If you only use custom headers for this, then the body of the message can be a normal message for the consumers.
To sum up:
1) the publisher needs to consume the messages itself
2) the publisher can send a special message in a special queue to tell consumers to ignore the message
3) the publisher can send a special message to a custom exchange that will clear any existing messages from the queues before sending this special message to consumers.