Why my pub sub message acknowledged? -> Data loss - google-cloud-functions

I have a cloud function which is called Pub/Sub. It should ACK the message only when it has been correctly processed.
I see in my logs a memory failure, then a message 'Finished with status: ok'.
And the message is acknowledged and removed from my Pub / Sub topic!
Reproducer:
import base64
import requests
def hello_pubsub(event, context):
""" Triggered from a message on a Cloud Pub/Sub topic.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for the event.
"""
pubsub_message = base64.b64decode(event['data']).decode('utf-8')
# let's say that important stuff is done at the line below
# which should be retried in case of failure
r = requests.get('https://zoom.us/client/latest/zoomusInstallerFull.pkg')
print(pubsub_message)

If the Google Cloud Function is going to be executed, it follows that the subject has already received the ack message.
The messages will receive an ACK if your function successfully completes by returning a resolved promise. PubSub will retry the message if the function throws an error or returns a denied promise.
Therefore, to prevent looping, you must build a retry method that will try a predetermined number of times and then quit. This is necessary if your Cloud Function was unable to process the received message.
Now, in accordance with the previous comment, try allowing retry on failure when an executionID is connected to the request that resulted in error. When a retriable exception is raised, this enables function execution to be retried.
Reviewing for the exact error message that is showing in Cloud Functions which is “Function invocation was interrupted. Error: function terminated”, tt would be advised to check the logs for termination-related information.
Here is further troubleshooting information from the Logging section. Setting up logs to aid in problem-solving can lead to more issues.

Related

Webhook handler for Stripe event ... of type <several events> failed: User not found

I have a test mode product which I am generating a payment link for and passing a uid encoded in hex through a client_reference_id param. I have an firebase cloud function event handler and a webhook in my test stripe dashboard pointing to my event endpoint. Everything seems to be functioning properly, but I'm seeing these errors in my log console for several of the events sent:
invoice.paid
invoice.payment_succeeeded
checkout.session.completed
customer.subscription.created
customer.subscription.updated
I'm seeing these in my google cloud logs explorer. The error is as stated in the description. I don't see a stacktrace, but there is a trace path. I found them in Google Cloud's Trace List, but there's no stacktrace in there either. Just the sequence that got me to the errror:
/ext-firestore-stripe-payments-handleWebhookEvents (2645.429 ms)
Handling Stripe event [<event name>] of type <event type>
❗️[Error]: Webhook handler for Stripe event [<eventname>] of type [checkout.session.completed] failed: User not found!
[Open in Logs Viewer]
Full Log Entry
{
textPayload: "❗️[Error]: Webhook handler for Stripe event [<eventname>] of type [checkout.session.completed] failed: User not found!"
insertId: "<id>"
resource: {2}
timestamp: "2023-02-04T00:24:11.569373Z"
severity: "ERROR"
labels: {2}
logName: "<path>"
trace: "<path>"
receiveTimestamp: "2023-02-04T00:24:11.661364910Z"
}
and finally
4. Function execution took 2248 ms, finished with status code: 200
It's just not clear to me what is causing this error. Since I am using a payment link, I don't know how I can have a user besides what the user enters for themselves in the payment screen (email, etc).
The issue here was that there was a second endpoint with a different name which I had created while testing which was still enabled. Since I changed the name, it appears the extension was reporting errors on the firebase side. Deleting / disabling the endpoint solves the reported errors.

When do GCP cloud functions acknowledge pub/sub messages?

I have a cloud function that gets triggered from a pub/sub message. This function never explicitly acknowledges the message in the source code.
So when does this function acknowledge the pub/sub message if the acknowledgement never happens in the source code?
Update: when a function crashes, I understand that a message acknowledgement shouldn't occur and yet a new function invocation for that message never appears in the logs
Reproducible Example
Create a pubsub topic called test_topic
Create a cloud function called test_function with trigger test_topic. Give it all the default settings including NOT retrying on failure. In the code itself, set the language to python3.7 with entry point of hello_pubsub and the following code:
import base64
def hello_pubsub(event, context):
pubsub_message = base64.b64decode(event['data']).decode('utf-8')
print(pubsub_message)
raise RuntimeError('error in function')
The requirements.txt remains blank
Go into test_topic and publish a message with go as the text.
There will be an error in the test_function logs. However there will only be one function invocation with the error and this will remain the case even after a few days or so.
If the function finish gracefully, the message is acknowledge. If the function exits in error, the message is NACK.
EDIT 1
I have tested with a Go background function. You need to deploy your cloud function with the parameter --retry to allow the messages in error to be retried. Else, the messages aren't retried.
In Go, here the cases where retried are performed:
Return an Error (equivalent to exception in Java or Python), status "error" in the logs
Perform a log.Fatal() (exit the function (function crash) with a specific log) status "connection error" in the logs
Perform an explicit exit, status "connection error" in the logs
Here the code (if interested)
type PubSubMessage struct {
Data []byte `json:"data"`
}
func PubsubError(ctx context.Context, m PubSubMessage) error {
switch string(m.Data) {
case "ok":
return nil
case "error":
return errors.New("it's an error")
case "fatal":
log.Fatal("crash")
case "exit":
os.Exit(1)
}
return nil
}
And how i deployed my Cloud Functions
gcloud beta functions deploy --runtime=go113 --trigger-topic=test-topic \
--source=function --entry-point=PubsubError --region=us-central1 \
--retry pubsuberror
Based on this description:
Google Cloud Pub/Sub Triggers
Cloud Functions acks the Pub/Sub message internally upon successful function invocation.
I do understand that documentation quotation as the acknowledgement happens only after the execution of code is finished without any (uncatched) errors.
At the same time, while the execution might still be 'in progress', the Pub/Sub service may make a decision to trigger another cloud function (instance) from the same Pub/Sub message.
Some additional details are in this Issue Tracker dicussion:
Cloud Function explicit acknowledgement of a pubsub message
From my point of view, independently from 'successful' or 'not successful' the invocation happened, the cloud function is to be developed in an idempopent way, taking into account 'at least once delivery' paradigm of the Pub/Sub service. In other words the cloud function is to be developed in a such a way, that multiple invocations from one message are handled correctly.

How to handle "Unexpected EOF at target" error from API calls?

I'm creating a Forge application which needs to get version information from a BIM 360 hub. Sometimes it works, but sometimes (usually after the code has already been run once this session) I get the following error:
Exception thrown: 'Autodesk.Forge.Client.ApiException' in mscorlib.dll
Additional information: Error calling GetItem: {
"fault":{
"faultstring":"Unexpected EOF at target",
"detail": {
"errorcode":"messaging.adaptors.http.flow.UnexpectedEOFAtTarget"
}
}
}
The above error will be thrown from a call to an api, such as one of these:
dynamic item = await itemApi.GetItemAsync(projectId, itemId);
dynamic folder = await folderApi.GetFolderAsync(projectId, folderId);
var folders = await projectApi.GetProjectTopFoldersAsync(hubId, projectId);
Where the apis are initialized as follows:
ItemsApi itemApi = new ItemsApi();
itemApi.Configuration.AccessToken = Credentials.TokenInternal;
The Ids (such as 'projectId', 'itemId', etc.) don't seem to be any different when this error is thrown and when it isn't, so I'm not sure what is causing the error.
I based my application on the .Net version of this tutorial: http://learnforge.autodesk.io/#/datamanagement/hubs/net
But I adapted it so I can retrieve multiple nodes asynchronously (for example, all of the nodes a user has access to) without changing the jstree. I did this to allow extracting information in the background without disrupting the user's workflow. The main change I made was to add another Route on the server side that calls "GetTreeNodeAsync" (from the tutorial) asynchronously on the root of the tree and then calls it on each of the returned children, then each of their children, and so on. The function waits until all of the nodes are processed using Task.WhenAll, then returns data from each of the nodes to the client;
This means that there could be many api calls running asynchronously, and there might be duplicate api calls if a node was already opened in the jstree and then it's information is requested for the background extraction, or if the background extraction happens more than once. This seems to be when the error is most likely to happen.
I was wondering if anyone else has encountered this error, and if you know what I can do to avoid it, or how to recover when it is caught. Currently, after this error occurs, it seems that every other api call will throw this error as well, and the only way I've found to fix it is to rerun the code (I use Visual Studio so I just rerun the server and client, and my browser launches automatically)
Those are sporadic errors from our apigee router due to latency issues in the authorization process that we are currently looking into internally.
When they occur please cease all your upcoming requests, wait for a few minutes and retry again. Take a look at stuff like this or this to help you out.
And our existing reports calling out similar errors seem to point to concurrency as one of the factors leading up to the issue so you might also want to limit your concurrent requests and see if that mitigate the issue.

AWS Lambda function working properly despite throwing timeout error and not resolving callback

I have a Lambda function that receives a message from SNS and uses a custom module that queries an external database and outputs calculations to the database. The module works fine: the Lambda has internet access via VPC and successfully connects to the database and outputs the desired data to the database, but I am still getting the error "Task timed out after 3.00 seconds." The module itself uses sequelize, async/await, and promises.
I increased the max timeout and the only difference was that the number of seconds in the error message increased to the timeout limit. I tried reserving concurrency and the error persists. Every part of my function works great other than the fact that the callback never resolves, producing a timeout error. I have tried running the function with and without the "context.callbackWaitsForEmptyEventLoop" statement, running it with the statement only makes the code return before any of the rating engine function is completed. Here is the rating engine code: https://github.com/elizajanus/rating-engine-module
Is it possible that the database connection is not closing within my custom module and preventing the code from fully completing the imported function? Or could it be something else? This issue may be connected: https://github.com/sequelize/sequelize/issues/8468
const {RatingEngine} = require('./rating-engine');
exports.handler = (event, context, callback) => {
const message = event.Records[0].Sns.Message;
RatingEngine(message, message.d_customer_id,
message.d_total_distance_travelled);
callback(null, 'move record created in database');
};
#Eliza Janus, you can test your code locally with https://www.npmjs.com/package/lambda-local, this will help you you to better identify the problem debugging the code.

Getting receive pipeline error information in BAM

I have two orchestrations One of them is used as an error handler for the other orchestration, and is getting failed messages from it. I have set this up in BAM. The problem is when a file fails in the receive port I don't get any useful information in the Activity Search. Only that something has been registered. Data ex from BAM:
ActivityID: 2738a492-04c7-4887-9ff3-6902f435bda4
ErrorCode:
ErrorDesc:
Filename:
Progress Error: Handled
TransactionId:
rcvPort:
sndPort:
In the tracking profiler I use the properties from the errorReporter. Ex ErrorReport.FailureCode. The file gets moved as it should by the Error handler orchestration.
Does someone now what I'm doing wrong?
Is it possible to get any information when a file fails in the receive stage?
Mostly I need the filename and the error code/desc. (the Progress Error is a progress activity I have created).
I worked it out. For some reason I couldn't trace it to bam if I made progress dimension for the messages. When I just stored the plain data it worked ok.