Parse/evaluate/generate CrontabExpressions outside of linux? - language-agnostic

I'm building some software that needs a scheduling input, and I'd really like to re-use the design of crontab because it simply works.
CrontabExpressions can be really simple */5 * * * * "run every five minutes" or more complex 2-59/3 1,9,22 11-26 1-6 ? 2003 "In 2003 on the 11th to 26th of each month in January to June every third minute starting from 2 past 1am, 9am and 10pm".
I am not looking to use the linux software called crontab, I'm seeking a way I can evaluate these expressions correctly (for instance, output the next 25 timestamps that match the crontab, or generate it based on some abstracted GUI for the users).
I can't really find any libraries or functions that do this in JavaScript or PHP or even other languages. If they don't exist, what would be a good method to do this? I already know an overly-complicated regular expression is likely to be the wrong answer. I'm having a hard time finding the C source code in crontab that does this task as well, which makes me believe it might not take place here?

To output the next 25 timestamps that match the crontab you could use crontab Python module:
from datetime import datetime, timedelta
import crontab
tab = crontab.CronTab('2-59/3 1,9,22 11-26 1-6 ? 2012')
dt = datetime.now()
for _ in xrange(25):
delay = tab.next(dt) # seconds before this crontab entry can be executed.
dt += timedelta(seconds=delay)
print(dt)
Output
2012-01-11 22:41:00
2012-01-11 22:44:00
2012-01-11 22:47:00
2012-01-11 22:50:00
2012-01-11 22:53:00
2012-01-11 22:56:00
2012-01-11 22:59:00
2012-01-12 01:02:00
2012-01-12 01:05:00
2012-01-12 01:08:00
2012-01-12 01:11:00
2012-01-12 01:14:00
2012-01-12 01:17:00
2012-01-12 01:20:00
2012-01-12 01:23:00
2012-01-12 01:26:00
2012-01-12 01:29:00
2012-01-12 01:32:00
2012-01-12 01:35:00
2012-01-12 01:38:00
2012-01-12 01:41:00
2012-01-12 01:44:00
2012-01-12 01:47:00
2012-01-12 01:50:00
2012-01-12 01:53:00
There is also python-crontab that provides crontab module but with richer functionality (parse/generate).

There is a Java library as part of the Quartz Scheduler which can be used to evaluate cron expressions quite easily.
The class CronExpression yields methods like isSatisfiedBy(Date date) or getNextValidTimeAfter(Date date) which is very useful.
The library is freely available.

Related

Import CSV timestamp with space

I have a CSV like this
timestamp,H_LOC20 (%RH),T_LOC20 (°C),P_LOC20 (Pa)
23 gen 2023 09:05:50 CET,"46,7","17,3","0,1"
23 gen 2023 09:06:00 CET,"46,7","17,3","0,1"
23 gen 2023 09:06:10 CET,"46,7","17,3","0,1"
23 gen 2023 09:06:20 CET,"46,7","17,3","0,1"
23 gen 2023 09:06:30 CET,"46,7","17,3","0,1"
23 gen 2023 09:06:40 CET,"46,7","17,3","0,1"
in Octave i use simple csv2cell to obtain a cell with all column.
How can i use the timestamp in a timeseries data?
I'm trying to use strftime with no luck
strftime is for converting a time struct to a string.
You are trying to do the reverse. The corresponding function for that is strptime
Note that octave supports two datetime-management systems, a c-based one, and a matlab-compatible one. strftime and strptime belong to the former. This shouldn't matter to you, of course, unless you want matlab compatible code, in which case you should be having a look at the datenum/datestr/datevec family. You might need to obtain all the date tokens from your string by yourself, via something like strtok or strsplit for that.
PS. To complicate things further, I believe matlab has now moved on from the above three functions to a 'Date' object based system ... but I haven't used that, and in any case it's not relevant to your situation within octave.

Google Fit estimated steps through REST API decrease in time with some users

We are using Googlefit REST API in a process with thousands of users, to get daily steps. With most of users, process is OK, although we are finding some users with this specific behaviour: users step increase during the day, but at some point, they decrease significantly.
We are finding a few issues related to this with Huawei health apps mainly (and some Xiaomi health apps).
We use this dataSourceId to get daily steps: derived:com.google.step_count.delta:com.google.android.gms:estimated_steps
An example of one of our requests to get data for 15th March (Spanish Times):
POST https://www.googleapis.com/fitness/v1/users/me/dataSources
Accept: application/json
Content-Type: application/json;encoding=utf-8
Authorization: Bearer XXXXXXX
{
"aggregateBy": [{
"dataTypeName": "com.google.step_count.delta",
"dataSourceId": "derived:com.google.step_count.delta:com.google.android.gms:estimated_steps"
}],
"bucketByTime": { "durationMillis": 86400000 },
"startTimeMillis": 1615244400000,
"endTimeMillis": 1615330800000
}
With most of users, this goes well (it gets the same data that shows up to the user in googlefit app), but with some users as described, numbers during day increase at first, and decrease later. Some users' data in the googlefit app is much greater (or significantly greater) than the one found through the REST API.
We have even traced this with a specific user during the day. Using buckets of 'durationMillis': 3600000, we have painted a histogram of hourly steps in one day (with a custom made process).
For the same day, in different moments of time (a couple of hours difference in this case), we get this for the EXACT SAME USER:
20210315-07 | ########################################################## | 1568
20210315-08 | ############################################################ | 1628
20210315-09 | ########################################################## | 1574
20210315-10 | ####################### | 636
20210315-11 | ################################################### | 1383
20210315-12 | ###################################################### | 1477
20210315-13 | ############################################### | 1284
20210315-14 | #################### | 552
vs. this, that was retrieved A COUPLE OF HOURS LATER:
20210315-08 | ################# | 430
20210315-09 | ######### | 229
20210315-10 | ################# | 410
20210315-11 | ###################################################### | 1337
20210315-12 | ############################################################ | 1477
20210315-13 | #################################################### | 1284
20210315-14 | ###################### | 552
("20210315-14" means 14.00 at 15th March of 2021)
This is the returning JSON in the first case:
[{"startTimeNanos":"1615763400000000000","endTimeNanos":"1615763460000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":6,"mapVal":[]}]},
{"startTimeNanos":"1615788060000000000","endTimeNanos":"1615791600000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1568,"mapVal":[]}]},
{"startTimeNanos":"1615791600000000000","endTimeNanos":"1615795080000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1628,"mapVal":[]}]},
{"startTimeNanos":"1615795200000000000","endTimeNanos":"1615798500000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1574,"mapVal":[]}]},
{"startTimeNanos":"1615798860000000000","endTimeNanos":"1615802400000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":636,"mapVal":[]}]},
{"startTimeNanos":"1615802400000000000","endTimeNanos":"1615806000000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1383,"mapVal":[]}]},
{"startTimeNanos":"1615806000000000000","endTimeNanos":"1615809480000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1477,"mapVal":[]}]},
{"startTimeNanos":"1615809660000000000","endTimeNanos":"1615813200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1284,"mapVal":[]}]},
{"startTimeNanos":"1615813380000000000","endTimeNanos":"1615815420000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":552,"mapVal":[]}]}]
This is the returning JSON in the latter case:
[{"startTimeNanos":"1615788300000000000","endTimeNanos":"1615791600000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":517,"mapVal":[]}]},
{"startTimeNanos":"1615791600000000000","endTimeNanos":"1615794540000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":430,"mapVal":[]}]},
{"startTimeNanos":"1615796400000000000","endTimeNanos":"1615798200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":229,"mapVal":[]}]},
{"startTimeNanos":"1615798980000000000","endTimeNanos":"1615802400000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":410,"mapVal":[]}]},
{"startTimeNanos":"1615802400000000000","endTimeNanos":"1615806000000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1337,"mapVal":[]}]},
{"startTimeNanos":"1615806000000000000","endTimeNanos":"1615809480000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1477,"mapVal":[]}]},
{"startTimeNanos":"1615809660000000000","endTimeNanos":"1615813200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1284,"mapVal":[]}]},
{"startTimeNanos":"1615813380000000000","endTimeNanos":"1615815420000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":552,"mapVal":[]}]}]
AS you can see, all points always come from originDataSourceId: "raw:com.google.step_count.delta:com.huawei.health"
It looks like a process of Googlefit is doing some kind of adjustments, removing some steps or datapoints, although we cannot find a way to detect what and why, and we cannot explain to the user what is happening or what he or we can do to make his app data to be exactly like ours (or the other way around). His googlefit app shows a number that is not the same one as the one that the REST API shows.
User has already disabled the "googlefit app tracking activities" option.
I would love to know, or try to get some hints to know:
What can I do to debug even more?
Any hint about why is happening this?
Is there anyway, from a configuration point of view (for the user) to prevent this to happen?
Is there anyway, from a development point of view, to prevent this to happen?
Thanks and regards.
UPDATE AFTER Andy Turner's question (thanks for the comment!).
We were able to "catch" this during several hours: 18.58 (around 6K steps), 21.58 (around 25K steps), 22.58 (around 17K steps), 23.58 (around 26K steps). We exported datasets for those, and here is the result.
Another important info: Data is coming only from "raw:com.google.step_count.delta:com.huawei.health". We went through other datasets that might look suspicious, and all were empty (apart from derived and so on).
If we interpret this correctly, probably it's huawei which is sending sometimes a value, and next time, another thing; so it's probably some misconfiguration in the huawei part.
Here are the datasets exported:
https://gist.github.com/jmarti-theinit/8d98996873a9c499a14899a9b62162f3
Result of the GIST is:
Length of 18.58 points 165
Length of 21.58 points 503
Length of 22.58 points 294
Length of 23.58 points 537
How many points in 21.58 that exist in 18.58 => 165
How many points in 22.58 that exist in 18.58 => 57
How many points in 22.58 that exist in 21.58 => 294
How many points in 23.58 that exist in 18.58 => 165
How many points in 23.58 that exist in 21.58 => 503
How many points in 23.58 that exist in 22.58 => 294
So our bet is points are removed and added by devices behind huawei (for example only 57 are common in 18.58 - 22.58), and we cannot control anything more from googlefit's side. Is that correct? Anything else we could see?
We're having similar issues using the REST API.
Here you have what coincides with the case of Jordi:
we are also from Spain (and our users too), although we use servers in Spain and the US
we get the same daily steps value as the google fit app for some users, but not for other users
daily steps increases during the current day, but every next day we make the request, daily steps decrease sometimes
we are making the same request, from the start of day to the end of the day, with 86400000 as bucket time and same data type and data source id
We are in the final development phase, so we're testing with a few users only. Our users have Xiaomi mi band devices.
We think that the problem could be a desynchronization of the servers that we're hitting, because if we test with other apps like this one, they show the correct values. We've created another google cloud console oauth client credentials and new email accounts to test with a brand new users and oauth clients, but the results are the same.
This is the recommended way to get the daily steps andwe are using exactly the same request
https://developers.google.com/fit/scenarios/read-daily-step-total
and even with the "try it" option in the documentation the results are wrong.
What else can we do to help you resolve the issue?
Thank you very much!

How do I optimise displaying time durations in a human-readable way?

I want to implement a functionality for my project. It's very similar to a feature on Stack Overflow where user post requests and gets responses. Here on Stack Overflow we see post marked as 4 seconds ago, 22 seconds ago, 1 minute ago, 5 minutes ago etc. I want to implement the same.
I am storing the request posted time in a timestamp variable in MySQL, then subtracting NOW() - stored_time to get the seconds. Then writing some logic, like
if less than 60 seconds, display 60 seconds ago
if difference in between 60 to 3600, display in minutes
and so on. This long logic is written in Perl. I want to avoid that. Is there any good way to achieve the same thing? I am open to change the MySQL table and data type.
Send number of elapsed seconds to client and convert it to human-readable text in JavaScript.
Retrieve the datestamps as DateTime objects. You don't show any details of your database, so I have to skip that step in my answer.
use DateTime qw();
use DateTime::Format::Human::Duration qw();
for my $seconds (555, 5555, 555555, 5555555) {
my $now = DateTime->now;
my $before = $now->clone->subtract(seconds => $seconds);
my $formatted = DateTime::Format::Human::Duration
->new->format_duration($before - $now);
$formatted =~ s/(?:,| and).*//;
print "about $formatted ago\n";
}
# about 9 minutes ago
# about 1 hour ago
# about 6 days ago
# about 2 months ago

Creating index takes too long time

About 2 months ago, I imported EnWikipedia data(http://dumps.wikimedia.org/enwiki/20120211/) into mysql.
After finished importing EnWikipedia data, I have been creating index in the tables of the EnWikipedia database in mysql for about 2 month.
Now, I have reached the point of creating index in "pagelinks".
However, it seems to take an infinite time to pass that point.
Therefore, I checked the time remaining to pass to ensure that my intuition was correct or not.
As a result, the expected time remaining was 60 days(assuming that I create index in "pagelinks" again from the beginning.)
My EnWikipedia database has 7 tables:
"categorylinks"(records: 60 mil, size: 23.5 GiB),
"langlinks"(records: 15 mil, size: 1.5 GiB),
"page"(records: 26 mil, size 4.9 GiB),
"pagelinks"(records: 630 mil, size: 56.4 GiB),
"redirect"(records: 6 mil, size: 327.8 MiB),
"revision"(records: 26 mil, size: 4.6 GiB) and "text"(records: 26 mil, size: 60.8 GiB).
My server is...
Linux version 2.6.32-5-amd64 (Debian 2.6.32-39),Memory 16GB, 2.39Ghz Intel 4 core
Is that common phenomenon for creating index to take so long days ?
Does anyone have a good solution to create index more quickly ?
Thanks in advance !
P.S: I made following operations for checking the time remaining.
References(Sorry,following page is written in Japanese): http://d.hatena.ne.jp/sh2/20110615
1st. I got records in "pagelink".
mysql> select count(*) from pagelinks;
+-----------+
| count(*) |
+-----------+
| 632047759 |
+-----------+
1 row in set (1 hour 25 min 26.18 sec)
2nd. I got the amount of records increased per minute.
getHandler_write.sh
#!/bin/bash
while true
do
cat <<_EOF_
SHOW GLOBAL STATUS LIKE 'Handler_write';
_EOF_
sleep 60
done | mysql -u root -p -N
command
$ sh getHandler_write.sh
Enter password:
Handler_write 1289808074
Handler_write 1289814597
Handler_write 1289822748
Handler_write 1289829789
Handler_write 1289836322
Handler_write 1289844916
Handler_write 1289852226
3rd. I computed the speed of recording.
According to the result of 2. ,the speed of recording is
7233 records/minutes
4th. Then the time remaining is
(632047759/7233)/60/24 = 60 days
Those are pretty big tables, so I'd expect the indexing to be pretty slow. 630 million records is a LOT of data to index. One thing to look at is partitioning, with data sets that large, without correctly partitioned tables, performance will be sloooow. Here's some useful links:
using partioning on slow indexes you could also try looking at the buffer size settings for building the indexes (the default is 8MB, do for your large table that's going to slow you down a fair bit. buffer size documentation

Ordering by created_at in unit tests with generated data in rails

I have a bit of code that basically displays the last x (variable, but let's say x is 20 here) updates made in a given table. In one of the unit tests for it, I have this snippet:
EditedItem.push_to_queue(hiddennow)
#create some new entries and save them
20.times{ EditedItem.push_to_queue(random_item) }
Queue.get_entries.each{|entry| assert_not_equal too_far_down, entry}
May or may not be pretty, but it gets the intention across. The hiddennow object has been pushed down in the queue too far and should no longer be returned when get_entries is called.
#this works
SearchObject.find(:all, :order => "id desc")
#this does not, unless the 20.times loop has sleep(1) or something
SearchObject.find(:all, :order => "created_at desc")
This is simplified down a bit, but it looks like the 20.times loop adds things fast enough that the order by clause on created_at cannot distinguish. My questions are, am I doing something fundamentally wrong? If not, what is the better approach to writing a test along these lines?
DigitalRoss is right. created_at has a one second granularity.
One option is to set the created_at when you create the objects:
old = EditItem.new(:created_at => 1.second.ago)
older = EditItem.new(:created_at => 2.seconds.ago)
Another option is to actually use stubbing to mess with the Time class. The following would work with Rspec, but could be easily accomplished with other mocking frameworks like Mocha.
#seconds = Time.now.to_i
Time.stub!(:now).and_return{Time.at(#seconds += 5) }
This will return a time 5 seconds greater than the previous each time you call Time.now.
I'd recommend the first approach if you can make it work, since it's more clear what you're doing and less likely to have unintended consequences.
Times related to files and records (and specifically those times in Rails) are typically kept in Unix time, or POSIX time, This format keeps the number of seconds since 1970 in an arithmetic type.
So, time for these purposes has a 1 second granularity.
Rails can't order hiddennow vs the random items without at least a one second delay in between, and the set of 20 won't be ordered at all.
Are these answers still correct in rails 5 or 6?
Suppose there is a legacy default scope on the User model:
#app/models/user.rb
class User
default_scope { order created_at: :desc }
end
The following rspec test
describe 'ordering in rails' do
before(:each) do
(0..9).each_with_index do |i|
create :user, email: "#{i}#example.com"
end
end
it 'preserves order' do
puts User.pluck(:id, :created_at, :email)
expect(User.all.pluck(:email).map(&:first)).to eq %w(9 8 7 6 5 4 3 2 1 0)
end
end
yields the following output:
7602
2020-01-07 09:33:14 UTC
9#example.com
7601
2020-01-07 09:33:14 UTC
8#example.com
7600
2020-01-07 09:33:14 UTC
7#example.com
7599
2020-01-07 09:33:14 UTC
6#example.com
7598
2020-01-07 09:33:14 UTC
5#example.com
7597
2020-01-07 09:33:14 UTC
4#example.com
7596
2020-01-07 09:33:14 UTC
3#example.com
7595
2020-01-07 09:33:14 UTC
2#example.com
7594
2020-01-07 09:33:14 UTC
1#example.com
7593
2020-01-07 09:33:14 UTC
0#example.com
.
Finished in 0.30216 seconds (files took 1.34 seconds to load)
1 example, 0 failures
so despite all the models being created at the same second, there is a consistent ordering. Looking at this rails 6 merge request, it looks like that by rails 5 there is an implicit ordering on the primary key. I wonder if the id is being used to break ties in later versions of rails?