Why would a cloud function fail to return when the logs report success? - google-cloud-functions

I've got a cloud function that takes about 5 minutes to run. I need it to run many times every Monday, so I've set up a Google Cloud Scheduler job that calls it every minute.
The Cloud Scheduler logs:
{
httpRequest: {}
insertId: "1cklvxkf7jdt20"
jsonPayload: {
#type: "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished"
jobName: "projects/site-speed-dashboard-v2/locations/us-central1/jobs/<job_id>"
status: "UNKNOWN"
targetType: "HTTP"
url: "https://us-central1-<project>.cloudfunctions.net/<function_id>"
}
logName: "projects/<project>/logs/cloudscheduler.googleapis.com%2Fexecutions"
receiveTimestamp: "2021-11-02T22:03:00.584062076Z"
resource: {
labels: {
job_id: "<job_id>"
location: "us-central1"
project_id: "<project>"
}
type: "cloud_scheduler_job"
}
severity: "ERROR"
timestamp: "2021-11-02T22:03:00.584062076Z"
}
The Cloud Function logs look mostly like this:
2021-11-03T20:18:00.867270511Z
lighthouse-runner7jabijqjeuz1 Function execution started
2021-11-03T20:18:01.649Z
lighthouse-runner7jabijqjeuz1 Running task B
2021-11-03T20:18:01.651Z
lighthouse-runner7jabijqjeuz1 Status 1 of 5 on task B
2021-11-03T20:18:37.863Z
lighthouse-runner7jabijqjeuz1 Status 2 of 5 on task B
2021-11-03T20:19:06.586Z
lighthouse-runnervhyzz79vlmyb Saving results for task A...
2021-11-03T20:19:09.585588918Z
lighthouse-runnervhyzz79vlmyb Function execution took 308731 ms, finished with status code: 200
2021-11-03T20:19:14.866Z
lighthouse-runner7jabijqjeuz1 Status 3 of 5 on task B
2021-11-03T20:19:44.667Z
lighthouse-runner7jabijqjeuz1 Status 4 of 5 on task B
2021-11-03T20:20:15.261Z
lighthouse-runner7jabijqjeuz1 Status 5 of 5 on task B
2021-11-03T20:20:45.675Z
lighthouse-runner7jabijqjeuz1 Saving results for task B...
2021-11-03T20:20:48.338318504Z
lighthouse-runner7jabijqjeuz1 Function execution took 167472 ms, finished with status code: 200
2021-11-03T20:21:00.409847950Z
lighthouse-runner7jab3riqsqx7 Function execution started
2021-11-03T20:21:01.205Z
lighthouse-runner7jab3riqsqx7 Running task C
2021-11-03T20:21:01.206Z
lighthouse-runner7jab3riqsqx7 Status 1 of 5 on task C
2021-11-03T20:21:58.668Z
lighthouse-runner7jab3riqsqx7 Status 2 of 5 on task C
2021-11-03T20:23:05.464Z
lighthouse-runner7jab3riqsqx7 Status 3 of 5 on task C
2021-11-03T20:24:01.974Z
lighthouse-runner7jab3riqsqx7 Status 4 of 5 on task C
2021-11-03T20:25:00.416988195Z
lighthouse-runner9o59yxh9wi7g Function execution started
2021-11-03T20:25:01.608Z
lighthouse-runner9o59yxh9wi7g Running task D
2021-11-03T20:25:01.608Z
lighthouse-runner9o59yxh9wi7g Status 1 of 5 on task D
2021-11-03T20:25:08.062Z
lighthouse-runner7jab3riqsqx7 Status 5 of 5 on task C
2021-11-03T20:26:05.760Z
lighthouse-runner7jab3riqsqx7 Saving results for task C...
2021-11-03T20:26:09.605393223Z
lighthouse-runner7jab3riqsqx7 Function execution took 309196 ms, finished with status code: 200
2021-11-03T20:26:27.471Z
lighthouse-runner9o59yxh9wi7g Status 2 of 5 on task D
2021-11-03T20:27:37.853Z
lighthouse-runner9o59yxh9wi7g Status 3 of 5 on task D
Note that there are no errors in the log (I know that the ids don't match up, but it's a very consistent pattern). What's happening here?
When there's no overlap (or often just less overlap) the function runs fine, but I don't have the throughput needed. Running it locally there's no issue, and the function successfully returns 200 every time.

Your error is usual. Have a look to your Cloud Functions logs:
2021-11-03T20:18:00.867270511Z lighthouse-runner7jabijqjeuz1 Function execution started
...
...
...
2021-11-03T20:20:48.338318504Z lighthouse-runner7jabijqjeuz1 Function execution took 167472 ms, finished with status code: 200
I took only these 2 lines, because they have the same executionID 7jabijqjeuz1. Now, not the processing duration: about 2 minutes and 48 seconds.
No problem, I'm sure you increase the Cloud Functions timeout. However, the Cloud Scheduler timeout is also to 60 seconds by default, and if the Cloud Scheduler don't receive a positive ack in the delay, the run is considered as failed.
Increase the Cloud Scheduler timeout to 3 minutes (at least) to fix your issue.

Related

How do i assert in groovy with multple scenarios?

I am testing an API using ReadyAPI and currently I am trying to figure some solution for the following situation:
I run 3 request in parallel that can affect the way they respond. The test is to add an object to a list and get errors on the other two lists and the item should be added to only one of them.
So far so good. but when running in parallel, the status codes modifi:
Ex:
First run:
Test 1 - 200
Test 2 - 400
Test 3 - 400
Cleanup
Second run:
Test 1 -400
Test 2 -200
Test 3 -400
Cleanup
Possibly 3rd run: (it isn't a predictable situation)
Test 1 - 200
Test 2 - 200
Test 3 - 400
The only way i thought of so far is to create a groovy assertion function so that it can assert and return the current statuses and to asset that 3rd run is exposed. Bear in mind that it is not always the third run that can generate this result. Any ideeas on how that function should work?
def A = 200
def B = 400
def C = 400
def assesmentFunct(a, b,c){
assert a = a
assert b = b
assert c = c
}
assesmentFunct(A,B,C)
The thig is that i can have 3 scenarios where 200 is possible so i am thinking of some kind of matrix of assetions
Please help
Tried if else nessted, switch-case
collect results into an array and assert the array
def results = [200,400,400]
assert results in [
[400,400,200],
[400,200,400],
[200,400,400]
]

Handle implementation error with Vivado TCL

I have several implementation (each with a different strategy) and I automate running them in Vivado with the following script:
reset_run synth_1
launch_runs synth_1 -jobs 16
wait_on_runs synth_1
# Run all implementations
launch_runs impl_1 -jobs 16
launch_runs impl_2 -jobs 16
launch_runs impl_3 -jobs 16
launch_runs impl_4 -jobs 16
launch_runs impl_5 -jobs 16
launch_runs impl_6 -jobs 16
However sometimes one of them fails (low memory or bug in the tools, this is known) and I would like to catch it and do something, maybe try running it again or stop the next steps (for instance if implementation has failed I don't want to export the hardware, because it would lead to another error because it can't find the bitstream).
Do you know how can I catch this problem within my tcl script?
I have found a solution, but for some reason the result is not among the first ones when googling for words like "vivado catch run failure" and similar, so I'll post an answer:
Based on this Answer Record we can do:
set isAllOk false
set outputOfSynthRun [launch_runs synth_1]
set runStatus [get_property STATUS [get_runs synth_1]
set runProgress [get_property PROGRESS [get_runs synth_1\\]]
if { $outputOfSynthRun == 0 && $runStatus == "XST Complete!" && $runProgress == "100%"} {
set isAllOk true
} else {
set isAllOk false
}

Difference in Difference in R (Callaway & Sant'Anna)

I'm trying to implement the DiD package by Callaway and Sant'Anna in my master thesis, but I'm coming across errors when I run the DiD code and when I try to view the summary.
did1 <- att_gt(yname = "countgreen",
gname = "signing_year",
idname = "investorid",
tname = "dealyear",
data = panel8)
This code warns me that:
"Be aware that there are some small groups in your dataset.
Check groups: 2006,2007,2008,2011. Dropped 109 observations that had missing data.overlap condition violated for 2009 in time period 2001Not enough control units for group 2009 in time period 2001 to run specified regression"
This error is repeated several hundred times.
Does this mean I need to re-match my treatment firms to control firms using a 1:3 ration (treat:control) rather than the 1:1 I used previously?
Then when I run this code:
summary(did1)
I get this message:
Error in Math.data.frame(list(`mpobj$group` = c(2009L, 2009L, 2009L, 2009L, : non-numeric variable(s) in data frame: mpobj$att
I'm really not too sure what this means.
Can anyone help trouble shoot?
Thanks,
Rory
I don't know the DiD package but i can't answer about the :summary(did1)
If you do str(did1) you should have something like this :
'data.frame': 6 obs. of 7 variables:
$ cluster : int 1 2 3 4 5 6
$ price_scal : num -0.572 -0.132 0.891 1.091 -0.803 ...
$ hd_scal : num -0.778 0.63 0.181 -0.24 0.244 ...
$ ram_scal : num -0.6937 0.00479 0.46411 0.00653 -0.31204 ...
$ screen_scal: num -0.457 2.642 -0.195 2.642 -0.325 ...
$ ads_scal : num 0.315 -0.889 0.472 0.47 -0.822 ...
$ trend_scal : num -0.604 1.267 -0.459 -0.413 1.156 ...
But in your case you should have one variable mpobj$att that is a factor or a str column.
Maybe this should also make the DiD code run.

Couchbase benchmark reveals very slow INSERTs and GETs (using KeyValue operations); slower than persisted MySQL data

I did a small benchmark test to compare Couchbase (running in Win) with Redis and MySql (EDIT: added Aerospike to test)
We are inserting 100 000 JSON "documents" into three db/stores:
Redis (just insert, there is nothing else)
Couchbase (in-memory Ephemeral buckets, JSON Index on JobId)
MySql (Simple table; Id (int), Data (MediumText), index on Id)
Aerospike (in-memory storage)
The JSON file is 67 lines, about 1800 bytes.
INSERT:
Couchbase: 60-100 seconds (EDIT: seems to vary quite a bit!)
MySql: 30 seconds
Redis: 8 seconds
Aerospike: 71 seconds
READ:
We are reading 1000 times, and we do this 10 times and look at averages.
Couchbase: 600-700 ms for 1000 GETs (Using KeyValue operations, not Query API. Using Query API, this takes about 1500 ms)
MySql: 90-100 ms for 1000 GETs
Redis: 50-60 ms for 1000 GETs
Aerospike: 750 ms for 1000 GETs
Conclusion:
Couchbase seems slowest (the INSERT times varies a lot it seems), Aerospike is also very slow. Both of these are using in-memory storage (Couchbase => Ephemeral bucket, Aerospike => storage-engine memory).
Question: Why the in-memory write and read on Couchbase so slow, even slower than using normal MySQL (on an SSD)?
CODE
Note: Using Task.WhenAll, or awaiting each call, doesn't make a difference.
INSERT
Couchbase:
IBucket bucket = await cluster.BucketAsync("halo"); // <-- ephemeral
IScope scope = bucket.Scope("myScope");
var collection = scope.Collection("myCollection");
// EDIT: Added this to avoid measuring lazy loading:
JObject t = JObject.FromObject(_baseJsonObject);
t["JobId"] = 0;
t["CustomerName"] = $"{firstnames[rand.Next(0, firstnames.Count - 1)]} {lastnames[rand.Next(0, lastnames.Count - 1)]}";
await collection.InsertAsync("0", t);
await collection.RemoveAsync("0");
List<Task> inserTasks = new List<Task>();
sw.Start();
foreach (JObject temp in jsonObjects) // jsonObjects is pre-created so its not a factor in the test
{
inserTasks.Add(collection.InsertAsync(temp.GetValue("JobId").ToString(), temp));
}
await Task.WhenAll(inserTasks);
sw.Stop();
Console.WriteLine($"Adding {nbr} to Couchbase took {sw.ElapsedMilliseconds} ms");
Redis (using ServiceStack!)
sw.Restart();
using (var client = redisManager.GetClient())
{
foreach (JObject temp in jsonObjects)
{
client.Set($"jobId:{temp.GetValue("JobId")}", temp.ToString());
}
}
sw.Stop();
Console.WriteLine($"Adding {nbr} to Redis took {sw.ElapsedMilliseconds} ms");
sw.Reset();
Mysql:
MySql.Data.MySqlClient.MySqlConnection mySqlConnection = new MySql.Data.MySqlClient.MySqlConnection("Server=localhost;Database=test;port=3306;User Id=root;password=root;");
mySqlConnection.Open();
sw.Restart();
foreach (JObject temp in jsonObjects)
{
MySql.Data.MySqlClient.MySqlCommand cmd = new MySql.Data.MySqlClient.MySqlCommand($"INSERT INTO test (id, data) VALUES ('{temp.GetValue("JobId")}', #data)", mySqlConnection);
cmd.Parameters.AddWithValue("#data", temp.ToString());
cmd.ExecuteNonQuery();
}
sw.Stop();
Console.WriteLine($"Adding {nbr} to MySql took {sw.ElapsedMilliseconds} ms");
sw.Reset();
READ
Couchbase:
IBucket bucket = await cluster.BucketAsync("halo");
IScope scope = bucket.Scope("myScope");
var collection = scope.Collection("myCollection");
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 1000; i++)
{
string key = $"{r.Next(1, 100000)}";
var result = await collection.GetAsync(key);
}
sw.Stop();
Console.WriteLine($"Couchbase Q: {q}\t{sw.ElapsedMilliseconds}");
Redis:
Stopwatch sw = Stopwatch.StartNew();
using (var client = redisManager.GetClient())
{
for (int i = 0; i < nbr; i++)
{
client.Get<string>($"jobId:{r.Next(1, 100000)}");
}
}
sw.Stop();
Console.WriteLine($"Redis Q: {q}\t{sw.ElapsedMilliseconds}");
MySQL:
MySqlConnection mySqlConnection = new MySql.Data.MySqlClient.MySqlConnection("Server=localhost;Database=test;port=3306;User Id=root;password=root;");
mySqlConnection.Open();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < nbr; i++)
{
MySqlCommand cmd = new MySql.Data.MySqlClient.MySqlCommand($"SELECT data FROM test WHERE Id='{r.Next(1, 100000)}'", mySqlConnection);
using MySqlDataReader rdr = cmd.ExecuteReader();
while (rdr.Read())
{
}
}
sw.Stop();
Console.WriteLine($"MySql Q: {q} \t{sw.ElapsedMilliseconds} ms");
sw.Reset();
Couchbase setup:
and
and Bucket Durability:
I only have 1 Node (no cluster), it's local on my machine, running Ryzen 3900x 12 cores, M.2 SSD, Win10, 32 GB RAM.
If you made it this far, here is a GitHub repo with my benchmark code:
https://github.com/tedekeroth/CouchbaseTests
I took your CouchbaseTests, commented out the non-Couchbase bits. Fixed the query to select from the collection ( myCollection ) instead of jobcache, and removed the Metrics option. And created an index on JobId.
create index mybucket_JobId on default:myBucket.myScope.myCollection (JobId)
It inserts the 100,000 documents in 19 seconds and kv-fetches the documents on average 146 usec and query by JobId on average 965 usec.
Couchbase Q: 0 187
Couchbase Q: 1 176
Couchbase Q: 2 143
Couchbase Q: 3 147
Couchbase Q: 4 140
Couchbase Q: 5 138
Couchbase Q: 6 136
Couchbase Q: 7 139
Couchbase Q: 8 125
Couchbase Q: 9 129
average et: 146 ms per 1000 -> 146 usec / request
Couchbase Q: 0 1155
Couchbase Q: 1 1086
Couchbase Q: 2 1004
Couchbase Q: 3 901
Couchbase Q: 4 920
Couchbase Q: 5 929
Couchbase Q: 6 912
Couchbase Q: 7 911
Couchbase Q: 8 911
Couchbase Q: 9 927
average et: 965 ms per 1000 -> 965 usec / request. (coincidentally exactly the same as with the java api).
This was on 7.0 build 3739 on a Mac Book Pro with the cbserver running locally.
######################################################################
I have a small LoadDriver application for the java sdk that uses the kv api. With 4 threads, it shows an average response time of 54 micro-seconds and throughput of 73238 requests/second. It uses the travel-sample bucket on a cb server on localhost. git#github.com:mikereiche/loaddriver.git
Run: seconds: 10, threads: 4, timeout: 40000us, threshold: 8000us requests/second: 0 (max), forced GC interval: 0ms
count: 729873, requests/second: 72987, max: 2796us avg: 54us, aggregate rq/s: 73238
For the query API I get the following which is 18 times slower.
Run: seconds: 10, threads: 4, timeout: 40000us, threshold: 8000us requests/second: 0 (max), forced GC interval: 0ms
count: 41378, requests/second: 4137, max: 12032us avg: 965us, aggregate rq/s: 4144
I would have to run such a comparison myself to do a full investigation, but two things stand out.
Your parallel execution isn't truly fully parallel. async methods run synchronously up to the first await, so all of the code in InsertAsync/GetAsync before the first await is running sequentially as you add your tasks, not parallel.
CouchbaseNetClient does some lazy connection setup in the background, and you're paying that cost in the timed section. Depending on the environment, including SSL negotiation and such things, this can be a significant initial latency.
You can potentially address the first issue by using Task.Run to kick off the operation, but you may need to pre-size the default Threadpool size.
You can address the second issue by doing at least one operation on the bucket (including bucket.WaitUntilReadyAsync()) before the timed section.
60 seconds for inserts still look abnormal. How many nodes and what Durability setting are you using?

django and celery beat scheduler no database entries

my problem is that the beat scheduler doesn't store entries in the table 'tasks' and 'workers'. i use django and celery. in my database (MySQL) i have added a periodic tast "Estimate Region" with Interval 120 seconds.
this is how i start my worker:
`python manage.py celery worker -n worker.node1 -B --loglevel=info &`
after i started the worker i can see in the terminal that the worker works and the scheduler picks out the periodic task from the database and operates it.
how my task is defined:
#celery.task(name='fv.tasks.estimateRegion',
ignore_result=True,
max_retries=3)
def estimateRegion(region):
terminal shows this:
WARNING ModelEntry: Estimate Region fv.tasks.estimateRegion(*['ASIA'], **{}) {<freq: 2.00 minutes>}
[2013-05-23 10:48:19,166: WARNING/MainProcess] <ModelEntry: Estimate Region fv.tasks.estimateRegion(*['ASIA'], **{}) {<freq: 2.00 minutes>}>
INFO Calculating estimators for exchange:Bombay Stock Exchange
the task "estimate region" returns me a results.csv file, so i can see that the worker and the beat scheduler works. But after that i have no database entries in "tasks" or "workers" in my django admin panel.
Here are my celery settings in settings.py
` CELERY_DISABLE_RATE_LIMITS = True
CELERY_TASK_SERIALIZER = 'pickle'
CELERY_RESULT_SERIALIZER = 'pickle'
CELERY_IMPORTS = ('fv.tasks')
CELERY_RESULT_PERSISTENT = True
# amqp settings
BROKER_URL = 'amqp://fv:password#localhost'
#BROKER_URL = 'amqp://fv:password#192.168.99.31'
CELERY_RESULT_BACKEND = 'amqp'
CELERY_TASK_RESULT_EXPIRES = 18000
CELERY_ROUTES = (fv.routers.TaskRouter(), )
_estimatorExchange = Exchange('estimator')
CELERY_QUEUES = (
Queue('celery', Exchange('celery'), routing_key='celery'),
Queue('estimator', _estimatorExchange, routing_key='estimator'),
)
# beat scheduler settings
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
# development settings
CELERY_RESULT_PERSISTENT = False
CELERY_DEFAULT_DELIVERY_MODE = 'transient'`
i hope anyone can help me :)
Have you started celerycam?
python manage.py celerycam
It will take a snapshot (every 1 second by default) of the current state of tasks.
You can read more about it in the celery documentation