Using cloud functions to schedule cloud tasks, upon scheduled time the cloud task triggers an HTTP end point. As of now created a single queue with the following configuration.
Max dispatches per second:500
Max concurrent dispatches :1000
Max attempts: 5
The cloud function is pub sub triggered. In a second pub sub may be receiving 10000 messages and in turn the cloud function scales and will be creating 10000 tasks.
Question:
If the scaled cloud functions has to create more tasks and assign it to different queues , how best the cloud function has to decide and create queues and assign tasks to different queues considering cold and warm queues capabilities to avoid latency.
I read through this official doc, but it is not so clear for dummies https://cloud.google.com/tasks/docs/manage-cloud-task-scaling#queue
Back to your original question, if your process is time sensitive and you need to trigger more than 500 requests at the same time, you need to create additional queues (as mentioned in the documentation)
To dispatch the AMQP messages in several queues, you need to define the number of queue that you need and a sharding key. If you have a numerical ID, you can use the modulo X (X is the number of queue) as key and use the corresponding queue name. You can also use a hash or your data.
In your process, if the queue exists, add a task to it, or create it, and add it then. In any case, you can't have more than 1000 queues.
Related
I have a use case in my system where I need to process hundreds of user records nightly. Currently, I have a scheduled Lambda function which pulls all the users to be processed and places each onto an SQS queue. I then have another Lambda function that reads from this queue and handles the processing. Each user requires quite a lot of processing which uses quite a few connections for each user. I use a mysql transaction in as many places as I can to cut down the connections used. I'm running into issues with my Aurora MySQL database hitting the connection limit (1000 currently). I have tried playing around with the batch sizes as well as the lambda concurrency but I still seem to run into issues. Currently, the batch size is 10 and concurrency is 1. The Lambda function does not use a connection pool as I found that caused more issues with connections. Am I missing something here or is this just an issue with MySQL and Lambda scaling?
Thanks
Amazon RDS Proxy is the solution provided by AWS to prevent a large number of Lambda functions from running at the same time and overwhelming the connection limit of the database instance.
Alternatively, you could use this trick to throttle the rate of lambdas:
Create another SQS queue and fill it with a finite set of elements. Say 100 elements, for instance. The values you put into this queue don't matter. It's the quantity that is important.
Lambdas are activated by this queue.
When the lambdas are activated, they request the next value from your first SQS queue, with the users to be processed.
If there are no more users to process, i.e. if the first queue is empty, then the lambda exits without connecting to Aurora.
Each lambda invocation processes the user. When it is done, it disconnects from Aurora and then pushes a new element onto the second SQS queue as its last step, which activates another lambda.
This way there are never more than 100 lambdas running at a time. Adjust this value to however many lambdas you want to allow concurrently.
I have a Google Calendar Trigger which fires whenever a change in ones users calendar is detected. The Trigger sends some data to a 3rd party system and does some logic. It is important that concurrent Triggers are not sending at the same time.
Therefore I am using the Lock Service which prevents exactly this.
var lock = LockService.getScriptLock();
try {
lock.waitLock(30000); // wait 30 seconds for others' use of the code section and lock to stop and then proceed
} catch (e) {
Logger.log('Could not obtain lock after 30 seconds.');
}
// this can take a few seconds
doSomeStuff();
lock.releaseLock();
// END - end lock here
return;
The problem is that sometimes the execution of a Trigger can take up to 10 seconds and it also can happen that sometimes multiple trigger are executed where every Trigger holds the lock.
This means that that the total execution time can easily be exceeded.
In my opinion this can not be solved with Google AppScript only. In order to handle this I guess the best way would be to have some kind of a queue where the trigger writes to and then Google Cloud Functions takes the data from this queue and runs the logic.
How to solve this?
The only queue like resource I could find in the Google World was Cloud Tasks but I am not sure if this is the best fit (it also needs a lot of setup work) when there are 10K Users where everyone has a calendarTrigger running.
Another Idea was that every UserTrigger writes the data to a database like Firestore and in the backend Google Cloud Functions read and delete the data to run the logic. So in this case the db would work also as a simplified queue.
In both cases the actually logic doSomeStuff() has to run in Google Cloud Functions. However since there are 10K users where every user can fire multiple Triggers I want to have full control over the amount of Google Cloud Functions run at the same time.
Summary
So in my head the solution is something like this:
trigger writes to
-> Queue (DB? Cloud Tasks?)
-> Backend Function watches Queue (Cloud Functions? Compute Engine?)
-> starts (max amount) Cloud functions to dequeue and run actual logic
Questions:
is such a "complex" structure involving those resources necessary?
if so, what are the best Google Cloud Resources to achieve this?
I have a pub/sub topic that I wish to consume from a single cloud function that is deployed in multiple regions. The behaviour that I observe is that all messages get consumed by a single function (ie they all get consumed in one region).
Is there a reason that messages are not being distributed and consumed in other regions?
It is not possible to have a cross-region Pub/Sub trigger with Cloud Functions given that the service is regional. What this means is that even if the same function is deployed to different regions, they are completely independent. However, if they have the exact same name, then only the first Function created will actually receive messages because the subscription name is unique per Cloud Function name and will be tied to this first deployment. If the functions have different names, then every deployment will create a unique subscription and receive the full stream of messages. We are looking at ways to improve this experience.
I am currently working on a ServiceBus trigger (using C#) which copies and moves related blobs to another blob storage and Azure Data Lake. After copying, the function has to emit a notification to trigger further processing tasks. Therefore, I need to know, when the copy/move task has been finished.
My first approach was to use a Azure Function which copies all these files. However, Azure Functions have a processing time limit of 10 minutes (when manually set) and therefore it seems to be not the right solution. I was considering calling azCopy or StartCopyAsync() to perform an asynchronous copy, but as far as I understand, the processing time of the function will be as long as azCopy takes. To solve the time limit problem, I could use WebJobs instead, but there are also other technologies like Logic Apps, Durable Azure functions, Batch jobs, etc. which makes me confused about choosing the right technology for this problem. The function won't be called every second but might copy large data. Does anybody have an idea?
I just found out that Azure Functions only have a time limit when using consumption plan. If there is no better solution for copy blob tasks, I'll go for Azure Functions.
I am building a web site backend that involves a client submitting a request to perform some expensive (in time) operation. The expensive operation also involves gathering some set of information for it to complete.
The work that the client submits can be fully described by a uuid. I am hoping to use a service oriented architecture (SOA) (i.e. multiple micro-services).
The client communicates with the backend using RESTful communication over HTTP. I plan to use a queue that the workers performing the expensive operation can poll for work. The queue has persistence and offers decent reliability semantics.
One consideration is whether I gather all of the data needed for the expensive operation upstream and then enqueue all of that data or whether I just enqueue the uuid and let the worker fetch the data.
Here are diagrams of the two architectures under consideration:
Push-based (i.e. gather data upstream):
Pull-based (i.e. worker gathers the data):
Some things that I have thought of:
In the push-based case, I would be likely be blocking while I gathered the needed data so the client's HTTP request would not be responded to until the data is gathered and then enqueued. From a UI standpoint, the request would be pending until the response comes back.
In the pull based scenario, only the worker needs to know what data is required for the work. That means I can have multiple types of clients talking to various backends. If the data needs change I update just the workers and not each of the upstream services.
Any thing else that I am missing here?
Another benefit of the pull based approach is that you don't have to worry about the data getting stale in the queue.
I think you already pretty much explained that the second (pull-based) approach is better.
If a user's request should anyway be processed asynchronously, why wait for the data to be gathered and then return a response. You need just to queue a work item and return HTTP response.
Passing data via queue is not a good option. If you get the data upstream, you will have to pass it somehow other than via queue to the worker (usually BLOB storage). That is additional work that is not really needed in your case.
I would recommend Cadence Workflow instead of queues as it supports long running operations and state management out of the box.
Cadence offers a lot of other advantages over using queues for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
See the presentation that goes over Cadence programming model.