Execution ID on Google Cloud Run - google-cloud-functions

I am wondering if there exists an execution id into Cloud Run as the one into Google Cloud Functions?
An ID that identifies each invocation separately, it's very useful to use the "Show matching entries" in Cloud Logging to get all logs related to an execution.
I understand the execution process is different, Cloud Run allows concurrency, but is there a workaround to assign each log to a certain execution?
My final need is to group at the same line the request and the response. Because, as for now, I am printing them separately and if a few requests arrive at the same time, I can't see what response corresponds to what request...
Thank you for your attention!

Open Telemetry looks like a great solution, but the learning and manipulation time isn't negligible,
I'm going with a custom id created in before_request, stored in Flask g and called at every print().
#app.before_request
def before_request_func():
execution_id = uuid.uuid4()
g.execution_id = execution_id

Related

Supplying 1 JDBC request result to multiple threads in JMeter

I'm facing an issue in Jmeter. The API I am testing, get the parameters from a prior JDBC request.
This works fine when there is only 1 thread. But, when I run multiple threads it throws the error below
{"Message": "A transient error has occurred. Please try again. (1205)","Data":null}
Here is the screenshot
I need to run 5 threads without having to run the JDBC request 5 times.
I can retrieve 5 results in 1 JDBC call and supply them sequentially for each of the thread. Is this possible? How can I do this?
Worst-case scenario I will have to manually set up CSV file instead of JDBC calls.
Normally people use setUp Thread Group for test data preparation and tearDown Thread Group for eventual clean-up. I would suggest moving your JDBC Request under the setUp Thread Group and run it with 1 virtual user.
If you have to keep the test plan structure as it is and can amend the SQL query to return more results, be aware that according to the JDBC Request sampler documentation the results look like:
myVar_#=5
myVar_1=foo
myVar_2=bar
myVar_3=baz
myVar_4=qux
myVar_5=corge
Therefore you can use the values using __V() and __threadNum() functions combination like:
${__V(myVar_${__threadNum},)}

How to optimise calling "sub" Cloud Functions in parallel from an HTTP triggered function

I have thousands of log files in a cloud storage bucket that I need to process and aggregate using an HTTP triggered cloud function and am looking for an approach to compute the task in the fastest possible way using parallelization.
At the moment, I have two cloud functions (nodejs 8):
The "main" function which a user is calling directly passing a list of log files that need to be processed; the function calls the "child" function for each provided log file that I also trigger with an HTTP request run parallel using async.each. The "child" function processes a single log file and returns the data to the "main" function which aggregates the results and, once all files are processed, sends the results back to the user.
If I call a child function directly, it takes about 1 second to complete a single file. I'd hope that if I call the main function to process 100 files in parallel the time will still be more or less 1 second. The first file in a batch is indeed returned after 1 second, but the time increases with every single file and the 100th file is returned after 7 seconds.
The most likely culprit is the fact that I'm running the child function using an HTTP request, but I haven't found a way to call them "internally". Is there another approach specific to Google Cloud Functions or maybe I can somehow optimise the parallelisation of HTTP requests?
The easiest approach is to simply share the code that does whatever the child function does, and invoke it directly from the main function. For some cases, it's simply easier and costs less due to fewer function invocations.
See also: Calling a Cloud Function from another Cloud Function

Google Cloud Function: lazy loading not working

I deploy a google cloud function with lazy loading that loads data from google datastore. The last update time of my function is 7/25/18, 11:35 PM. It works well last week.
Normally, if the function is called less than about 30 minutes since last called. The function does not need to load data loaded from google datastore again. But I found that the lazy loading is not working since yesterday. Even the time between two function is less than 1 minute.
Does anyone meet the same problem? Thanks!
The Cloud Functions can fail due to several reasons such as uncaught exception and internal process crashes, therefore, it is required to check the logs files / HTTP responses error messages to verify the issue root cause and determine if the function is being restarted and generating Function execution timeouts that could explain why your function is not working.
I suggest you take a look on the Reporting Errors documentation that explains the process required to return a function error in order to validate the exact error message thrown by the service and return the error at the recommended way. Keep in mind that when the errors are returned correctly, then the function instance that returned the error is labelled as behaving normally, avoiding cold starts that leads higher latency issues, and making the function available to serve future requests if need be.

How do I create a lot of sample data for firestore?

Let's say I need to create a lot of different documents/collections in firestore. I need to add it quickly, like copy and paste json. I can't do that with standard firebase console, because adding 100 documents will take me forever. Is there any solutions for to bulk create mock data with a given structure in firestore db?
If you switch to the Cloud Console (rather than Firebase Console) for your project, you can use Cloud Shell as a starting point.
From the Cloud Shell environment you'll find tools like node and python installed and available. Using whatever one you prefer, you can write a script using the Server Client libraries.
For example in Python:
from google.cloud import firestore
import random
MAX_DOCUMENTS = 100
SAMPLE_COLLECTION_ID = u'users'
SAMPLE_COLORS = [u'Blue', u'Red', u'Green', u'Yellow', u'White', u'Black']
# Project ID is determined by the GCLOUD_PROJECT environment variable
db = firestore.Client()
collection_ref = db.collection(SAMPLE_COLLECTION_ID)
for x in range(0, MAX_DOCUMENTS - 1):
collection_ref.add({
u'primary': random.choice(SAMPLE_COLORS),
u'secondary': random.choice(SAMPLE_COLORS),
u'trim': random.choice(SAMPLE_COLORS),
u'accent': random.choice(SAMPLE_COLORS)
})
While this is the easiest way to get up and running with a static dataset, it lives a little to be desired. Namely with Firestore, live dynamic data is needed to exercises it's functionally, such as real-time queries. For this task, using Cloud Scheduler & Cloud Functions is a relatively easy way to regularly updating sample data.
In addition to the sample generation code, you'll specific the update frequency in the Cloud Scheduler. For instance in the image below, */10 * * * * defines a frequency of every 10 minutes using the standard unix-cron format:
For non-static data, often a timestamp is useful. Firestore provides a way to have a timestamp from the database server added at write-time as one of the fields:
u'timestamp': firestore.SERVER_TIMESTAMP
It is worth noting that timestamps like this will hotspot in production systems if not sharded correctly. Typically 500 writes/second to the same collection is the maximum you will want so that the index doesn't hotspot. Sharding can be as simple something like as each user having their own collection (500 writes/second per user). However for this example, writing 100 documents every minute via a scheduled Cloud Function is definitely not an issue.
FireKit is a good resource to use for this purpose. It even allows sub-collections.
https://retroportalstudio.gumroad.com/l/firekit_free

Data Studio connector making multiple calls to API when it should only be making 1

I'm finalizing a Data Studio connector and noticing some odd behavior with the number of API calls.
Where I'm expecting to see a single API call, I'm seeing multiple calls.
In my apps script I'm keeping a simple tally which increments by 1 every url fetch and that is giving me the correct number I expect to see with getData().
However, in my API monitoring logs (using Runscope) I'm seeing multiple API requests for the same endpoint, and varying numbers for different endpoints in a single getData() call (they should all be the same). E.g.
I can't post the code here (client project) but it's substantially the same framework as the Data Connector code on Google's docs. I have caching and backoff implemented.
Looking for any ideas or if anyone has experienced something similar?
Thanks
Per the this reference, GDS will also perform semantic type detection if you aren't explicitly defining this property for your fields. If the query is semantic type detection, the request will feature sampleExtraction: true
When Data Studio executes the getData function of a community connector for the purpose of semantic detection, the incoming request will contain a sampleExtraction property which will be set to true.
If the GDS report includes multiple widgets with different dimensions/metrics configuration then GDS might fire multiple getData calls for each of them.
Kind of a late answer but this might help others who are facing the same problem.
The widgets / search filters attached to a graph issue getData calls of their own. If your custom adapter is built to retrieve data via API calls from third party services, data which is agnostic to the request.fields property sent forward by GDS => then these API calls are multiplied by N+1 (where N = the amout of widgets / search filters your report is implementing).
I could not find an official solution for this either, so I invented a workaround using cache.
The graph's request for getData (typically requesting more fields than the Search Filters) will be the only one allowed to query the API Endpoint. Before starting to do so it will store a key in the cache "cache_{hashOfReportParameters}_building" => true.
if (enableCache) {
cache.putString("cache_{hashOfReportParameters}_building", 'true');
Logger.log("Cache is being built...");
}
It will retrieve API responses, paginating in a look, and buffer the results.
Once it finished it will delete the cache key "cache_{hashOfReportParameters}building", and will cache the final merged results it buffered so far inside "cache{hashOfReportParameters}_final".
When it comes to filters, they also invoke: getData but typically with only up to 3 requested fields. First thing we want to do is make sure they cannot start executing prior to the primary getData call... so we add a little bit of a delay for things that might be the search filters / widgets that are after the same data set:
if (enableCache) {
var countRequestedFields = requestedFields.asArray().length;
Logger.log("Total Requested fields: " + countRequestedFields);
if (countRequestedFields <= 3) {
Logger.log('This seams to be a search filters.');
Utilities.sleep(1000);
}
}
After that we compute a hash on all of the moving parts of the report (date range, plus all of the other parameters you have set up that could influence the data retrieved form your API endpoints):
Now the best part, as long as the main graph is still building the cache, we make these getData calls wait:
while (cache.getString('cache_{hashOfReportParameters}_building') === 'true') {
Logger.log('A similar request is already executing, please wait...');
Utilities.sleep(2000);
}
After this loop we attempt to retrieve the contents of "cache_{hashOfReportParameters}_final" -- and in case we fail, its always a good idea to have a backup plan - which would be to allow it to traverse the API again. We have encountered ~ 2% error rate retrieving data we cached...
With the cached result (or buffered API responses), you just transform your response as per the schema GDS needs (which differs between graphs and filters).
As you start implementing this, you`ll notice yet another problem... Google Cache is limited to max 100KB per key. There is however no limit on the amount of keys you can cache... and fortunately others have encountered similar needs in the past and have come up with a smart solution of splitting up one big chunk you need cached into multiple cache keys, and gluing them back together into one object when retrieving is necessary.
See: https://github.com/lwbuck01/GASs/blob/b5885e34335d531e00f8d45be4205980d91d976a/EnhancedCacheService/EnhancedCache.gs
I cannot share the final solution we have implemented with you as it is too specific to a client - but I hope that this will at least give you a good idea on how to approach the problem.
Caching the full API result is a good idea in general to avoid round trips and server load for no good reason if near-realtime is good enough for your needs.