Question on agents: I specifically want to create a Periodic Task, but only want to run it once every day, say 1am, not every 30 minutes which is the default. In the OnInvoke, do I simply check for the hour, and run it only if current hour matches that desired hour.
But on the next OnInvoke call, it will try to run again in 30 minute, maybe when it's 1:31am.
So I guess I'd use a stored boolean in the app settings to mark as "already run for today" or similar, and then check against that value?
If you specifically want to run a custom action at 1 am, i'm not sure that a single boolean would be enough to make it work.
I guess that you plan to reset your boolean at 1:31 to prepare the execution of the next day, but what if your periodic task is also called at 1h51 (so called more than 2 times between 1am and 2am).
How could this happen? Well maybe this could happen if the device is reboot but i'm not quiet sure about it. In any case, storing the last execution datetime somewhere and comparing it to the current one can be a safer way to ensure that your action is only invoked once per day.
One question remains : Where to store your boolean or datetime (depending which one you'll pick)?
AppSetting does not seem to be a recommanded place according msdn :
Passing information between the foreground app and background agents
can be challenging because it is not possible to predict if the agent
and the app will run simultaneously. The following are recommended
patterns for this.
For Periodic and Resource-intensive Agents: Use LINQ 2 SQL or a file in isolated storage that is guarded with a Mutex. For
one-direction communication where the foreground app writes and the
agent only reads, we recommend using an isolated storage file with a
Mutex. We recommend that you do not use IsolatedStorageSettings to
communicate between processes because it is possible for the data to
become corrupt.
A simple file in isolated storage should get the job done.
If you're going by date (once per day) and it's valid that the task can run at 11pm on a day and 1am the next, then after the agent has run you could store the current date (forgetting about time). Then whenever the agent runs again in 30 minutes, check if the date the task last ran is the same as the current date.
protected override void OnInvoke(ScheduledTask task)
{
var lastRunDate = (DateTime)IsolatedStorageSettings.ApplicationSettings["LastRunDate"];
if(DateTime.Today.Subtract(lastRunDate).Days > 0)
{
// it's a greater date than when the task last ran
// DO STUFF!
// save the date - we only care about the date part
IsolatedStorageSettings.ApplicationSettings["LastRunDate"] = DateTime.Today;
IsolatedStorageSettings.ApplicationSettings.Save();
}
NotifyComplete();
}
Related
apparently there is something in my code that is stuck as on occasions it causes the script to run until time out at 6minutes.
I am still trying to find that code but in the meantime, i really need to prevent the waiting. My script typically needs 10 seconds only.
Is there a way for me to set such that any script that hits 10 seconds should be terminated.
much appreciated!
First of all, isn't it better to rewrite the code so that it does not reach 6 minutes?
Reference URL
If there is a possibility that it will exceed 6 minutes, try using a trigger to execute it for more than 6 minutes.
Reference URL
If there are multiple commands or is executed repeatedly, then TheMaster's recommendation in the comments is a good thing to be considered.
This will how it would look like.
function sampleFunction() {
var startTime = Date.now();
// do your thing, initialization, etc.
while (true) {
// do your thing, main process
// if 10 seconds is done
if ((Date.now() - startTime) >= 10000)
return;
}
}
Limitations:
If the command before the if condition checking the time takes 5 minutes, then you will be able to exit the script after that execution which will be 5 minutes later.
You can't stop a single command mid-execution. You need to wait for your if condition to be executed.
Things to note:
Exiting script through the method above works best when numerous lines of codes are being repeated in a loop and consistently checking the if condition so you can exit the script when the time comes.
You can only exit the script in between commands, not during those commands. So using this between commands that take too much time might stop the script later than expected.
It is still best to debug what is happening and why that "BUG" happens, or maybe you missed something that is causing the unexpected behavior.
Upon providing the exact code, the community will be able to help you with a more specific answer. Until then, speculations could only be provided.
We used to use the guava cache and we want to change it to caffeine.
We want to set for each entity its own "expiration time", something like - put(K key, V value, long expiration_time).
I saw the 3 functions above and I wonder what exactly they are doing, if you can explain me the meaning ant the operations of each one of them it will be great.
For example, the return value of expireAfterCreate should be the duration we want for this entity from it's creation untill it's expiration? or something else?
I'm also wondering why we have the parameter "currentTime" in both expireAfterRead and expireAfterUpdate if we don't use it in the function?
When we used the guava cache we used the expireAfterAccess, what is the substitution for it in caffeine?
My last question is how can I set a default value for entities without a unique expiration time.
Thank you,
May
When we used the guava cache we used the expireAfterAccess, what is the substitution for it in caffeine?
We mirror the Guava API, so this is also available on the cache builder.
My last question is how can I set a default value for entities without a unique expiration time.
Use expireAfterAccess, expireAfterWrite, or return a constant duration with expireAfter(Expiry).
I saw the 3 functions above and I wonder what exactly they are doing, if you can explain me the meaning ant the operations of each one of them it will be great.
Expiry is a callback interface where a single timestamp value is updated. The invoked method corresponds to the operation performed on the cache entry (created, updated, read). An update or read that should have no effect can return currentDuration to no-op.
For example, the return value of expireAfterCreate should be the duration we want for this entity from it's creation untill it's expiration? or something else?
Yes. However if the expireAfterUpdate returns a custom value (something other than currentDuration), then that overrides the prior expiration duration.
I'm also wondering why we have the parameter "currentTime" in both expireAfterRead and expireAfterUpdate if we don't use it in the function?
This can most often be ignored, but is provided if somehow useful. It is the current nano timestamp from the Ticker (not wall clock time).
We want to set for each entity its own "expiration time", something like - put(K key, V value, long expiration_time).
The callback Expiry is required and generally recommended, because ideally entries are loaded through the cache to avoid stampedes (e.g. LoadingCache). A stampede is when multiple threads lookup the same entry, miss, load it, and overwrite each other putting it in. That wasted work rather than having only one thread perform the load and others wait for the results.
That said, this method is available under Cache.policy().expiresVariably(). Those configuration-specific methods are stashed in that area to offer more power when deemed necessary.
Thank you,
You're very welcome.
I am trying to ingest data from a 3rd party API into a Dataflow pipeline. Since the 3rd party doesn't make webhooks available, I wrote a custom script that constantly polls their endpoint for more data.
The data is refreshed every 15 minutes, but since I don't want to miss any datapoints and I want to consume as soon as new data is available, my "crawler" runs every 1 minute. The script then sends the data to a PubSub topic. Easy to see that PubSub will receive about 15 repeated messages for each datapoint in the source.
My first attempt to identify and discard those repeated messages was to add a custom attribute to each PubSub message (eventid), created from a hash of its [ID + updated_time] at source.
const attributes = {
eventid: Buffer.from(`${item.lastupdate}|${item.segmentid}`).toString('base64'),
timestamp: item.timestamp.toString()
};
const dataBuffer = Buffer.from(JSON.stringify(item))
publisher.publish(dataBuffer, attributes)
Then I configured Dataflow with a withIdAttribute() (which is the new idLabel(), based on Record IDs).
PCollection<String> input = p
.apply("ReadFromPubSub", PubsubIO
.readStrings()
.fromTopic(String.format("projects/%s/topics/%s", options.getProject(), options.getIncomingDataTopic()))
.withTimestampAttribute("timestamp")
.withIdAttribute("eventid"))
.apply("OutputToBigQuery", ...)
With that implementation, I was expecting that when the script sends the same datapoint a second time, the repeated eventid would be the same and the message discarded. But for some reason, I still see duplicates on the output dataset.
Some questions:
Is there a clever way to ingest the data to dataflow from that 3rd party API if they don't provide webhooks?
Any ideas on why dataflow is not discarding the messages on this situation?
I know about the 10-minute restriction for deduplication on dataflow, but I see duplicated data even on the 2nd insertion (2 minutes).
Any help will be greatly appreciated!
I think you are on the right track, instead of the hash I recommend to use timestamps. A better way to to this is by using windows. Review this document which filters data that is outside of the window.
Regarding the additional duplicate data, if you are using pull subscriptions and the acknowledgement deadline is reached before having the data processed the message will be resent as per the at-least-once delivery. In this case change the acknowledgement deadline, the defaults is 10 seconds.
I was stuck in a situation that I have initialised a namesapce with
default-ttl to 30 days. There was about 5 million data with that (30-day calculated) ttl-value. Actually, my requirement is that ttl should be zero(0), but It(ttl-30d) was kept with unaware or un-recognise.
So, Now I want to update prev(old) 5 million data with new ttl-value (Zero).
I've checked/tried "set-disable-eviction true", but it is not working, it is removing data according to (old)ttl-value.
How do I overcome out this? (and I want to retrieve the removed data, How can I?).
Someone help me.
First, eviction and expiration are two different mechanisms. You can disable evictions in various ways, such as the set-disable-eviction config parameter you've used. You cannot disable the cleanup of expired records. There's a good knowledge base FAQ What are Expiration, Eviction and Stop-Writes?. Unfortunately, the expired records that have been cleaned up are gone if their void time is in the past. If those records were merely evicted (i.e. removed before their void time due to crossing the namespace high-water mark for memory or disk) you can cold restart your node, and those records with a future TTL will come back. They won't return if either they were durably deleted or if their TTL is in the past (such records gets skipped).
As for resetting TTLs, the easiest way would be to do this through a record UDF that is applied to all the records in your namespace using a scan.
The UDF for your situation would be very simple:
ttl.lua
function to_zero_ttl(rec)
local rec_ttl = record.ttl(rec)
if rec_ttl > 0 then
record.set_ttl(rec, -1)
aerospike:update(rec)
end
end
In AQL:
$ aql
Aerospike Query Client
Version 3.12.0
C Client Version 4.1.4
Copyright 2012-2017 Aerospike. All rights reserved.
aql> register module './ttl.lua'
OK, 1 module added.
aql> execute ttl.to_zero_ttl() on test.foo
Using a Python script would be easier if you have more complex logic, with filters etc.
zero_ttl_operation = [operations.touch(-1)]
query = client.query(namespace, set_name)
query.add_ops(zero_ttl_operation)
policy = {}
job = query.execute_background(policy)
print(f'executing job {job}')
while True:
response = client.job_info(job, aerospike.JOB_SCAN, policy={'timeout': 60000})
print(f'job status: {response}')
if response['status'] != aerospike.JOB_STATUS_INPROGRESS:
break
time.sleep(0.5)
Aerospike v6 and Python SDK v7.
I use Beantalkd and Yii2 framework.
To add in queue I use something like this:
Yii::$app->beanstalk
->putInTube('tube2', ['param' => 'val'], PheanstalkInterface::DEFAULT_PRIORITY, PheanstalkInterface::DEFAULT_DELAY);
But now I need to plain some task right at specified time, is it possible with Beantalkd, or I need something like Resque?
You can play some task at a sepcified time by calculating the delay, and sending that as a parameter to your above example.
On the other hand, it would be good to store time based lists for example in Redis, and have a cron that reads the expired ones every minute and loads the jobs to beanstalkd.