I have a pub/sub topic that I wish to consume from a single cloud function that is deployed in multiple regions. The behaviour that I observe is that all messages get consumed by a single function (ie they all get consumed in one region).
Is there a reason that messages are not being distributed and consumed in other regions?
It is not possible to have a cross-region Pub/Sub trigger with Cloud Functions given that the service is regional. What this means is that even if the same function is deployed to different regions, they are completely independent. However, if they have the exact same name, then only the first Function created will actually receive messages because the subscription name is unique per Cloud Function name and will be tied to this first deployment. If the functions have different names, then every deployment will create a unique subscription and receive the full stream of messages. We are looking at ways to improve this experience.
Related
Using cloud functions to schedule cloud tasks, upon scheduled time the cloud task triggers an HTTP end point. As of now created a single queue with the following configuration.
Max dispatches per second:500
Max concurrent dispatches :1000
Max attempts: 5
The cloud function is pub sub triggered. In a second pub sub may be receiving 10000 messages and in turn the cloud function scales and will be creating 10000 tasks.
Question:
If the scaled cloud functions has to create more tasks and assign it to different queues , how best the cloud function has to decide and create queues and assign tasks to different queues considering cold and warm queues capabilities to avoid latency.
I read through this official doc, but it is not so clear for dummies https://cloud.google.com/tasks/docs/manage-cloud-task-scaling#queue
Back to your original question, if your process is time sensitive and you need to trigger more than 500 requests at the same time, you need to create additional queues (as mentioned in the documentation)
To dispatch the AMQP messages in several queues, you need to define the number of queue that you need and a sharding key. If you have a numerical ID, you can use the modulo X (X is the number of queue) as key and use the corresponding queue name. You can also use a hash or your data.
In your process, if the queue exists, add a task to it, or create it, and add it then. In any case, you can't have more than 1000 queues.
I have a pubsub topic with roughly 1 message per second published. The message size is around 1kb. I need to get these data realtime both into cloudsql and bigquery.
The data are coming at a steady rate and it's crucial that none of them get lost or delayed. Writing them multiple times into destination is not a problem. The size of all the data in database is around 1GB.
What are dis/advantages of using google cloud functions triggered by the topic versus google dataflow to solve this problem?
Dataflow is focused on transformation of the data before loading it into a sink. Streaming pattern of Dataflow (Beam) is very powerful when you want to perform computation of windowed data (aggregate, sum, count,...). If your use case required a steady rate, Dataflow can be a challenge when you deploy a new version of your pipeline (hopefully easily solved if doubled values aren't a problem!)
Cloud Function is the glue of the cloud. In your description, it seems perfectly fit. On the topic, create 2 subscriptions and 2 functions (one on each subscription). One write in BigQuery, the other in CLoud SQL. This parallelisation ensures you the lowest latency in the processing.
I am currently working on a ServiceBus trigger (using C#) which copies and moves related blobs to another blob storage and Azure Data Lake. After copying, the function has to emit a notification to trigger further processing tasks. Therefore, I need to know, when the copy/move task has been finished.
My first approach was to use a Azure Function which copies all these files. However, Azure Functions have a processing time limit of 10 minutes (when manually set) and therefore it seems to be not the right solution. I was considering calling azCopy or StartCopyAsync() to perform an asynchronous copy, but as far as I understand, the processing time of the function will be as long as azCopy takes. To solve the time limit problem, I could use WebJobs instead, but there are also other technologies like Logic Apps, Durable Azure functions, Batch jobs, etc. which makes me confused about choosing the right technology for this problem. The function won't be called every second but might copy large data. Does anybody have an idea?
I just found out that Azure Functions only have a time limit when using consumption plan. If there is no better solution for copy blob tasks, I'll go for Azure Functions.
I am offering a Restful API to clients that access it by webservice. Now I would like to be able to count the API call per month.
What would be the best way to count the calls? Incrmenting a DB field would mean on DB call more. Is there a workaround? We are talking about millions of API calls per month.
You can also log to text file and use log analytics tools such as Webalizer(http://www.webalizer.org/) to analyze the text files
HTH
You can use a separate in-memory database to track these values, and write them to disk occasionally. Or store calls in a collection and batch-write them to the database occasionally.
Doing some testing on google script's ContactsApp and loading in contacts. It looks like it takes as much time to run ContactsApp.getContacts() (loading all contacts) as it does to run ContactsApp.getContact('email') (specific contact). About 14 seconds on each method for my contacts
My assumption is that both methods are calling all contacts and the 2nd only matches on email. This drags quite a bit.
Has anyone confirmed this and is there anyway to keep the loaded contacts in memory between pages (session variable?).
You've got a few options for storing per-user data:
If it's a small amount of data, you can use User Properties
You can store much more data using ScriptDb, but this will be global, so you'll have to segment off user data yourself
If you only need the data for a short amount of time, say, between function calls, you can use the Cache Services. You'll want to use getPrivateCache()
It sounds like for your use case getPrivateCache() is your best option for user specific session-like data storage.
(Just make sure your intended use fits within the terms of service.)