Ordering by created_at in unit tests with generated data in rails - mysql

I have a bit of code that basically displays the last x (variable, but let's say x is 20 here) updates made in a given table. In one of the unit tests for it, I have this snippet:
EditedItem.push_to_queue(hiddennow)
#create some new entries and save them
20.times{ EditedItem.push_to_queue(random_item) }
Queue.get_entries.each{|entry| assert_not_equal too_far_down, entry}
May or may not be pretty, but it gets the intention across. The hiddennow object has been pushed down in the queue too far and should no longer be returned when get_entries is called.
#this works
SearchObject.find(:all, :order => "id desc")
#this does not, unless the 20.times loop has sleep(1) or something
SearchObject.find(:all, :order => "created_at desc")
This is simplified down a bit, but it looks like the 20.times loop adds things fast enough that the order by clause on created_at cannot distinguish. My questions are, am I doing something fundamentally wrong? If not, what is the better approach to writing a test along these lines?

DigitalRoss is right. created_at has a one second granularity.
One option is to set the created_at when you create the objects:
old = EditItem.new(:created_at => 1.second.ago)
older = EditItem.new(:created_at => 2.seconds.ago)
Another option is to actually use stubbing to mess with the Time class. The following would work with Rspec, but could be easily accomplished with other mocking frameworks like Mocha.
#seconds = Time.now.to_i
Time.stub!(:now).and_return{Time.at(#seconds += 5) }
This will return a time 5 seconds greater than the previous each time you call Time.now.
I'd recommend the first approach if you can make it work, since it's more clear what you're doing and less likely to have unintended consequences.

Times related to files and records (and specifically those times in Rails) are typically kept in Unix time, or POSIX time, This format keeps the number of seconds since 1970 in an arithmetic type.
So, time for these purposes has a 1 second granularity.
Rails can't order hiddennow vs the random items without at least a one second delay in between, and the set of 20 won't be ordered at all.

Are these answers still correct in rails 5 or 6?
Suppose there is a legacy default scope on the User model:
#app/models/user.rb
class User
default_scope { order created_at: :desc }
end
The following rspec test
describe 'ordering in rails' do
before(:each) do
(0..9).each_with_index do |i|
create :user, email: "#{i}#example.com"
end
end
it 'preserves order' do
puts User.pluck(:id, :created_at, :email)
expect(User.all.pluck(:email).map(&:first)).to eq %w(9 8 7 6 5 4 3 2 1 0)
end
end
yields the following output:
7602
2020-01-07 09:33:14 UTC
9#example.com
7601
2020-01-07 09:33:14 UTC
8#example.com
7600
2020-01-07 09:33:14 UTC
7#example.com
7599
2020-01-07 09:33:14 UTC
6#example.com
7598
2020-01-07 09:33:14 UTC
5#example.com
7597
2020-01-07 09:33:14 UTC
4#example.com
7596
2020-01-07 09:33:14 UTC
3#example.com
7595
2020-01-07 09:33:14 UTC
2#example.com
7594
2020-01-07 09:33:14 UTC
1#example.com
7593
2020-01-07 09:33:14 UTC
0#example.com
.
Finished in 0.30216 seconds (files took 1.34 seconds to load)
1 example, 0 failures
so despite all the models being created at the same second, there is a consistent ordering. Looking at this rails 6 merge request, it looks like that by rails 5 there is an implicit ordering on the primary key. I wonder if the id is being used to break ties in later versions of rails?

Related

How to avoid target "leaking" into the training process with PyTorch's TimeSeriesDataSet and TemporalFusionTransformer?

Basically, I have a time series dataset that wants to predict the price 14 days from now. There are dozens of features and everything is being trained with TemporalFusionTransformer (PyTorch version).
Originally, I trained to predict price every day for the next 14 days, however these gave fairly poor results. So, I want to switch to predict ONLY the price 14 days from now. For a conventional dataset I would just lag the price variable 14 days (dataset['price_lagged'] = dataset['price'].shift(-13) and set max_prediction_length to 1 to predict the 14th day) and set this as a target variable.
Current code that still predicts well (and it shouldn't):
training = TimeSeriesDataSet(
dataset[lambda x: x.day <= training_cutoff],
time_idx="day",
target="price_lagged",
group_ids=["group_id"]
min_encoder_length=max_encoder_length ,
max_encoder_length=max_encoder_length,
max_prediction_length=max_prediction_length,
static_categoricals=[],
static_reals=[],
time_varying_known_categoricals=[],
time_varying_unknown_reals=[
#I removed EVERYTHING from here
],
time_varying_unknown_categoricals=[
],
time_varying_known_reals=[
],
target_normalizer=EncoderNormalizer(),
#lags=lags,
add_relative_time_idx=True,
allow_missing_timesteps=False,
add_target_scales=True,
)
However, the model seems to use the target as a variable as well. I can see this if I literally remove every other feature, I still get a fairly good prediction. How can I explicitly have the model NOT use the target as a (obvious) feature? Can TFT work like this?

How do I optimise displaying time durations in a human-readable way?

I want to implement a functionality for my project. It's very similar to a feature on Stack Overflow where user post requests and gets responses. Here on Stack Overflow we see post marked as 4 seconds ago, 22 seconds ago, 1 minute ago, 5 minutes ago etc. I want to implement the same.
I am storing the request posted time in a timestamp variable in MySQL, then subtracting NOW() - stored_time to get the seconds. Then writing some logic, like
if less than 60 seconds, display 60 seconds ago
if difference in between 60 to 3600, display in minutes
and so on. This long logic is written in Perl. I want to avoid that. Is there any good way to achieve the same thing? I am open to change the MySQL table and data type.
Send number of elapsed seconds to client and convert it to human-readable text in JavaScript.
Retrieve the datestamps as DateTime objects. You don't show any details of your database, so I have to skip that step in my answer.
use DateTime qw();
use DateTime::Format::Human::Duration qw();
for my $seconds (555, 5555, 555555, 5555555) {
my $now = DateTime->now;
my $before = $now->clone->subtract(seconds => $seconds);
my $formatted = DateTime::Format::Human::Duration
->new->format_duration($before - $now);
$formatted =~ s/(?:,| and).*//;
print "about $formatted ago\n";
}
# about 9 minutes ago
# about 1 hour ago
# about 6 days ago
# about 2 months ago

Calculate date of birth from age at specific date [MySQL or Perl]

Apologies if this is a really simple question but I am interested in trying to reach an accurate answer and not just a "rounded" up answer.
My problem is: I know somebody is 27.12 on the 18th of March 2008 (random example). How can I calculate, to the nearest approximation, his date of birth. Age is always provided as a real number to two decimal points.
The solutions through simple fractional calculation are 1981-02-03 and the day before, due to rounding. As eumiro said, the resolution of 1/100 year is not precise enough, so it might still be off a day or two with the real date.
use DateTime qw();
use POSIX qw(modf);
my $date = DateTime->new(year => 2008, month => 3, day => 18); # 2008-03-18
my $age = 27.12; # 27.12
my ($days, $years) = modf $age; # (0.12, 27)
$days *= 365.25; # 43.83
# approx. number of days in a year, is accurate enough for this purpose
$date->clone->subtract(years => $years, days => $days); # 1981-02-03
$date->clone->subtract(years => $years, days => 1 + $days); # 1981-02-02
eumiro's answer does the trick; the following, using the Time::Piece module (bundled with Perl since 5.10) is perhaps more maintainable.
use strict;
use warnings;
use 5.010;
use Time::Piece;
use Time::Seconds;
my ($date, $age) = ('2008-03-18', 27.12);
my $birthday = Time::Piece->strptime($date, '%Y-%m-%d') - $age*ONE_YEAR;
say $birthday->ymd();
This will get you within a few days of the actual birthday, due to the lack of accuracy (1/100 year) in the age.
use strict;
use Time::Local;
my $now = timelocal(0, 0, 12, 18, 3-1, 2008);
my $birthday = $now - 27.12 * 365.25 * 86400;
print scalar localtime $birthday;
returns Mon Feb 2 22:04:48 1981.
Your precision is 0.01 year, which is roughly 3 days, so you even cannot cover all birthdays.
My method does not cover leap years very well, but you cannot really calculate exactly with them. Imagine the 01-March-2008. What date was "1 year and 1 day" before this date? 28-February-2007 or the not existing 29-February-2007?
A method that permits greater accuracy simply takes advantage of existing MySQL Date/Time functions. If working inside the MySQL, you can calculate the age with great precision by converting each of two dates to seconds in the TO_SECONDS() conversion and then manipulating the results to the desired precision. In these cases, the dates are in 'yyyy-mm-dd hh:mm:ss' formats and a year is assumed to have mean length of 365.242 days.
ROUND((TO_SECONDS(AnyDateTime) - TO_SECONDS(DateOfBirth))/(365.242*60*60*24),3) as age,
e.g.:
ROUND((TOSECONDS('2013-01-01 00:00:00') - TO_SECONDS('1942-10-16')/(365.242*60*60*24),3) as AGE --> 70.214
Alternatively you can use the DATEDIFF() conversion which provides the answer in days:
ROUND(DATEDIFF('2013-01-01 00:00:00','1942-10-16')/365.242,3) AS age --> 70.214

Calculate date from numeric value

The number 71867806 represents the present day, with the smallest unit of days.
Sorry guy's, caching owned me, it's actually milliseconds!
How can I
calculate the currente date from it?
(or) convert it into an Unix timestamp?
Solution shouldn't use language depending features.
Thanks!
This depends on:
What unit this number represents (days, seconds, milliseconds, ticks?)
When the starting date was
In general I would discourage you from trying to reinvent the wheel here, since you will have to handle every single exception in regards to dates yourself.
If it's truly an integer number of days, and the number you've given is for today (April 21, 2010, for me as I'm reading this), then the "zero day" (the epoch) was obviously enough 71867806 days ago. I can't quite imagine why somebody would pick that though -- it works out to roughly 196,763 years ago (~194,753 BC, if you prefer). That seems like a strange enough time to pick that I'm going to guess that there's more to this than what you've told us (perhaps more than you know about).
It seems to me the first thing to do is verify that the number does increase by one every 24 hours. If at all possible keep track of the exact time when it does increment.
First, you have only one point, and that's not quite enough. Get the number for "tomorrow" and see if that's 71867806+1. If it is, then you can safely bet that +1 means +1 day. If it's something like tomorrow-today = 24, then odds are +1 means +1 hour, and the logic to display days only shows you the "day" part. If it's something else check to see if it's near (24*60, which would be minutes), (24*60*60, which would be seconds), or (24*60*60*1000, which would be milliseconds).
Once you have an idea of what kind of units you are using, you can estimate how many years ago the "start" date of 0 was. See if that aligns with any of the common calendar systems located at http://en.wikipedia.org/wiki/List_of_calendars. Odds are that the calendar you are using isn't a truly new creation, but a reimplementation of an existing calendar. If it seems very far back, it might be an Julian Date, which has day 0 equivalent to BCE 4713 January 01 12:00:00.0 UT Monday. Julian Dates and Modified Julian dates are often used in astronomy calculations.
The next major goal is to find Jan 1, 1970 00:00:00. If you can find the number that represents that date, then you simply subtract it from this foreign calendar system and convert the remainder from the discovered units to milliseconds. That will give you UNIX time which you can then use with the standard UNIX utilities to convert to a time in any time zone you like.
In the end, you might not be able to be 100% certain that your conversion is exactly the same as the hand implemented system, but if you can test your assumptions about the calendar by plugging in numbers and seeing if they display as you predicted. Use this technique to create a battery of tests which will help you determine how this system handles leap years, etc. Remember, it might not handle them at all!
What time is: 71,867,806 miliseconds from midnight?
There are:
- 86,400,000 ms/day
- 3,600,000 ms/hour
- 60,000 ms/minute
- 1,000 ms/second
Remove and tally these units until you have the time, as follows:
How many days? None because 71,867,806 is less than 86,400,000
How many hours? Maximum times 3,600,000 can be removed is 19 times
71,867,806 - (3,600,000 * 19) = 3,467,806 ms left.
How many minutes? Maximum times 60,000 can be removed is 57 times.
3,467,806 - (60,000 * 57) = 47,806 ms left
How many seconds? Maximum times 1,000 can be removed is 47 times.
47,806 - (1,000 * 47) = 806
So the time is: 19:57:47.806
It is indeed a fairly long time ago if the smallest number is in days. However, assuming you're sure about it I could suggest the following shell command which would be obviously not valid for dates before 1st Jan. 1970:
date -d "#$(echo '(71867806-71853086)*3600*24'|bc)" +%D
or without bc:
date -d "#$(((71867806 - 71853086) * 3600 * 24))" +%D
Sorry again for the messy question, i got the solution now. In js it looks like that:
var dayZero = new Date(new Date().getTime() - 71867806 * 1000);

MySQL: Order by time (MM:SS)?

I'm currently storing various metadata about videos and one of those bits of data is the length of a video.
So if a video is 10 minutes 35 seconds long, it's saved as "10:35" in the database.
But what I'd like to do is retrieve a listing of videos by length (longest first, shortest last).
The problem I'm having is that if a video is "2:56", it's coming up as longest because the number 2 is more than the number 1 in.
So, how can I order data based on that length field so that "10:35" is recognized as being longer than "2:56" (as per my example)?
SELECT * FROM table ORDER BY str_to_date(meta_time,'%l:%i')
You can find the specific formatters on the MySQL Website.
For example:
%k -> Hour (0..23)
%l -> Hour (1..12)
The easiest choice is to store a integer (seconds) or a float (minutes) instead of a string. So 10:35 would be 635 in seconds or 10.583 in minutes. You can sort by these numerically very easily. And you can output them in the format you'd like with some simple math and string functions.
Some options:
Save it as an integer representing the total number of seconds. "10:35" => 635
Save it as a timestamp object with no date component. "10:35" => MAKETIME(0, 10, 34)
Save it with leading decimals or spaces. "2:25" => " 2:25"
My preference would be for the first option.
You could try to see if
ORDER BY TIME_TO_SEC(timefield)
would parse it correctly, however it is not an optimal approach to store time as strings in the database, and I suggest that you store them as TIME if you are able to. Then you can use standard formatting functions to present them as you like.
I had the same problem - storing videos length in database.
I solved it by using TIME mysql type - it solves all ordering and converting issues.