I'm attempting to build a "last 30 days" dynamic SQL date filter for a user application. The date column is a unix epoch millisecond timestamp.
Previous iterations of the tool allowed the user to choose a date range, I'm now just changing it to choose the last 30.
The data is stored in Redshift, which does not support from_unixtime.
I have two challenges:
The data is stored in UTC and needs to be filtered with dates in EST (UTC -5).
"Choosing the last 30 days" means cutting off at midnight yesterday, and taking yesterday minus 29.
Previously, my code looked like this:
"datecol" >= DATEDIFF(millisecs, '1969-12-31 19:00:00', ''start date' 00:00:00')
AND "datecol" <= DATEDIFF(millisecs, '1969-12-31 19:00:00', ''end date' 23:59:59')
The application would update the start and end dates as described by the user. This code is adjusted for the time difference.
How can I use GETDATE() and DATEADD() on a Unix timestamp, using the constraints of Redshift SQL?
Thanks.
extract('epoch' from ts) gives you unix timestamps and you just add 5 hours to query UTC as if it is EST (if EST is UTC-5 then UTC is EST+5)
between extract('epoch' from ('<<date1>>' + interval '5 hour'))
and extract('epoch' from ('<<date2>>' + interval '29 hour' - interval '1 second'))
also, from_unixtime can be expressed in Redshift as the following:
select timestamp 'epoch' + unix_ts_column * interval '1 second'
a bit ugly but works just like that
I think you want to write a User Defined Function (UDF) for your Redshift database using Python and the python standard datetime module. See http://docs.aws.amazon.com/redshift/latest/dg/user-defined-functions.html
Follow the section titled Creating a Scalar Python UDF.
I don't quite understand your query or the context, but I think you can figure out how to get what you want using UDLs.
For example to get the milliseconds between two datetimes (one in UTC, one in EST) you would write is like the following (not tested):
CREATE FUNCTION datediff_py(a datetime, b datetime)
returns float
stable
as $$
#python code goes here between the $$
from datetime import datetime
FMT = '%Y-%m-%d %H:%M:%S' #dates like '2016-12-24 23:59:59'
tdelta = datetime.strptime(a + " UTC", FMT + " %Z") - datetime.strptime(b + " EST", FMT + " %Z")
return tdelta.total_seconds()*1000
$$ language plpythonu;
This computes the milliseconds between an SQL datetime a that is in UTC and b that is in EST. The %Z format is used for timezones. A usage would be:
"datecol" >= datediff_py('1969-12-31 19:00:00', user_date)
Of course Unix epoch is actually '1970-01-01 00:00:00'.
There are plenty of other date functions in the Python standard library datetime module, so you can write other UDLs if you need things like GETDATE() or DATEADD(), for exampling using timedelta
Related
I need to modify the output of a datetime field so that it returns the first day of the week in which that date falls. For example, if the date is 9/2/2016, it should return 8/29/2016 because that's the Monday of that week (I want Monday, not Sunday). However, I also need to convert the timestamp to a different timezone. The result is that I end up having to convert the timezone twice:
CONVERT_TZ(timestamp, '+00:00', '+05:00') - INTERVAL WEEKDAY(CONVERT_TZ(timestamp, '+00:00', '+05:00')) day
I can't simply perform the INTERVAL calculation on the UTC datetime and then convert the timezone on the result because the timezone conversion may affect the result of the WEEKDAY function, e.g. a datetime of 2016-9-5 00:00 UTC will actually fall on 9/4 for EST, thus causing it to be part of a different week.
Is there a way to avoid having to make two calls to CONVERT_TZ in the SELECT statement?
You can put it into a subquery.
SELECT converted - INTERVAL WEEKDAY(converted) AS day
FROM (SELECT CONVERT_TZ(timestamp, '+00:00', '+05:00') AS converted
FROM yourTable) AS x
Another way is to assign a user variable. To be able to use it twice, put the assignment into the condition part of an IF() expression -- that ensures that the assignment will be done before the uses.
IF(#converted := CONVERT_TZ(timestamp, '+00:00', '+05:00'),
#converted - INTERVAL(WEEKDAY(#converted) DAY,
NULL) AS day
I have the following query from Group OHLC-Stockmarket Data into multiple timeframes - Mysql.
SELECT
FLOOR(MIN(`timestamp`)/"+period+")*"+period+" AS timestamp,
SUM(amount) AS volume,
SUM(price*amount)/sum(amount) AS wavg_price,
SUBSTRING_INDEX(MIN(CONCAT(`timestamp`, '_', price)), '_', -1) AS `open`,
MAX(price) AS high,
MIN(price) AS low,
SUBSTRING_INDEX(MAX(CONCAT(`timestamp`, '_', price)), '_', -1) AS `close`
FROM transactions_history -- this table has 3 columns (timestamp, amount, price)
GROUP BY FLOOR(`timestamp`/"+period+")
ORDER BY timestamp
In my select statement, FLOOR(MIN(timestamp)/"+period+")*"+period+" AS timestamp,
I am trying to understand what it is doing. and
I need to convert this back to a mysql date/time Y-M-D H:i:s string or a UTC timestamp for parsing via javascript.
Let's assume that +period+ is 86400 (The number of seconds in a day)
Let's assume that the timestamp is '2015-12-08 20:58:58'
From what I can see, it takes the timestamp, which internally is stored as an integer and divides by 86400.
'2015-12-08 20:58:58' / 86400 = 233231576.4566898000
It then uses the FLOOR operation which would make it 233231576 then multiplies by 86400 again (I assume that this is to ensure rounding to the day)
I end up with 20151208166400.
So that's the 8th December 2015 but I also have 166400 which I have no idea what it is?
So now the second part of the question is, how to convert this integer to 2015-12-08 %H:%i:%s or even a UTC timestamp for parsing via Javascript.
I mentioned the problem in the comment, but not a fix. The problem is that the proposed code is for a unix timestamp, not a datetime value.
This can be fixed by doing appropriate conversions
SELECT FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(MIN(timestamp)) / $period) * $period)
This gives you the flexibility of have arbitrary numbers of seconds for the groupings.
You're right that FLOOR(timestamp / 86400) * 86400 is a crude way of rounding a UNIX-style timestamp (seconds since 1970-01-01 00:00UTC) to midnight on the present day UTC.
If that's what you're trying to do, I suggest you try this kind of MySQL code:
SELECT DATE_FORMAT(DATE(`timestamp`), '%Y-%m-%d'),
...
GROUP BY DATE(`timestamp`)
This uses MySQL's built in date arithmetic to turn a timestamp into midnight.
But you should be careful of one thing. Those timestamps are all stored in UTC (f/k/a Greenwich Mean Time). When you do date arithmetic with them, or pull them out of the database to use them, they're automatically converted to local time according to your MySQL time zone settings.
It is rounding timestampt to period (e.g day).
DATE_FORMAT( DATE( FLOOR(MIN(timestamp)/"+period+")*"+period+" ) , '%Y-%m-%d %H:%i:%s' )
If period==day consider using only MySQL period rounding by DAY().
Convert a Date object to a string, according to universal time:
var d = new Date();
var n = d.toUTCString();
The result of n will be:
Mon, 28 Dec 2015 12:57:32 GMT
I need to get the timestamp of interval of 7 days from the current time in milliseconds. I tried date_sub using now() but didn't work for me. How do we do this in hive. I need exactly the interval current_timestamp(unix) and interval of 7 days from the current in my query. Also is there any provision to select the time zone like UTC + 5:30 hrs like that?
I could not find information about millisecond based time calculations in HIVE.
unix_timestamp() is the current timestamp, but it does not have milliseconds.
The offset is 7 days*24 hours/day*3600 secs/hour = 604800 milliseconds
So the timestamp of the current time plus 7 days would be unix_timestamp() + 604800
The UTC part is trickier; you can use to_utc_timestamp, giving it your calculated timestamp, and the timezone it is coming from (as a date). It will return a date string, which you will pass through unix_timestamp()
In other words, assuming it is coming from PST, you should use:
select unix_timestamp(to_utc_timestamp(from_unixtime(unix_timestamp() + 604800), 'PST')) from dual;
See the documentation here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
I have read various questions here on Stackoverflow about the use of FROM_UNIXTIME but none directly deal with what I am trying to do. I have one timestamp in a variable coming from php (that has been reformatted - e.g. 25 March 2014) to a function which uses a database query to determine if there are other entries in the database that have the same date (not time). I've run across various methods for formatting and comparing timestamp entries using MySql and ended up with the following but I understand that it isn't very efficient. Does anyone know of a better way to accomplish this?
FROM_UNIXTIME(sd.timestart, "%e %M %Y") = ?'
where the variable in my array for comparison is the date format listed above. This accomplishes what I want but, again, I don't think it is the most efficient way to get this done. Any advice and/or ideas will be much appreciated.
*EDIT*
My timestamp is stored as an integer so I'm trying to use:
$thissessiondate = strtotime($date->timestart, strtotime('today'));
and
$tomorrowdate = strtotime($date->timestart, strtotime('tomorrow'));
to do trim to midnight but get an error (strtotime() expects parameter 2 to be long) and when I move 'today' to the first argument position, I get a conversion to 11 pm instead of 0:00...? I'm making some progress but my very incomplete knowledge of both PHP and MySQL are holding me back.
If you can avoid it, don't wrap columns used in predicates in expressions.
Have your predicates on bare columns to make index range scans possible. You want the datatype conversion to happen over on the literal side of the predicate, wherever possible.
The STR_TO_DATE function is the most convenient for this.
Assuming the timestart column is DATE, DATETIME or TIMESTAMP (which it really should be, if it represents a point in time.)
WHERE sd.timestart >= STR_TO_DATE( ? , "%e %M %Y")
AND sd.timestart < STR_TO_DATE( ? , "%e %M %Y") + INTERVAL 1 DAY
Effectively, what that's doing is taking the string passed in as the first argument to the STR_TO_DATE function, MySQL is going to convert that string to a DATETIME, based on the format specified as the second argument. And that effectively becomes a literal that MySQL can use to compare to the stored values in the column.
If there's an appropriate index available, MySQL will consider an index range scan operation to satisfy that predicate.
You'd need to pass in the same value twice, but that's not really a problem.
On the second line, we're just adding a day to the same value. So what MySQL is seeing is this:
WHERE sd.timestart >= STR_TO_DATE( '25 March 2014' , "%e %M %Y")
AND sd.timestart < STR_TO_DATE( '25 March 2014' , "%e %M %Y") + INTERVAL 1 DAY
In terms of performance, that's equivalent to:
WHERE sd.timestart >= '2014-03-15 00:00:00'
AND sd.timestart < '2014-03-16 00:00:00'
If you do it the other way around, and wrap timestart in a function, that's going to require MySQL to evaluate the function on every single row (or at least, on every row that isn't filtered out by another predicate first.)
IMPORANT NOTE
Be aware that MySQL interprets datetime values as being in the timezone of the MySQL connection, which defaults to the timezone setting of the MySQL server. MySQL is going to interpret datetime literals in the current setting of the timezone. For example, if MySQL timezone is set to +00:00, then datetime literals will be interpreted as UTC.
I assumed the format string matches the data being passed in, I don't use %e or %m. The %Y is a four digit year. (The list of format elements is in the MySQL documentation, under the DATE_FORMAT function.
Reference: http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_str-to-date
If your timestart column is INTEGER or other numeric datatype, representing a number of seconds or some other unit of time since the beginning of an era, you can use the same approach for performance benefits.
In the predicate, reference bare columns from the table, and do any conversions required on the literal side.
If you aren't using MySQL functions to do the conversion to "seconds since Jan 1, 1970 UTC" when rows are inserted (which is really what the TIMESTAMP datatype is doing internally), then I wouldn't recommend using MySQL functions to do the conversion in the query either.
If you're doing the conversion from date and time to an integer type "timestamp" in PHP, then I'd do the inverse conversion in PHP as well, and do the trimming to midnight and the adding of a day in PHP.
In that case, your MySQL query would be of the simple form:
WHERE sd.timestart >= ?
AND sd.timestart < ?
Where you would pass in the appropriate integer values, to compare to the INTEGER timestamp column.
Note that MySQL does provide a function for converting to "seconds since Jan 1 1970 UTC", so if timestart is seconds since Jan 1 1970 UTC, then something like this is valid:
WHERE sd.timestart >= UNIX_TIMESTAMP(STR_TO_DATE( '25 March 2014' , "%e %M %Y"))
AND sd.timestart < UNIX_TIMESTAMP(STR_TO_DATE( '25 March 2014' , "%e %M %Y") + INTERVAL 1 DAY)
BUT... again, be aware of timezone conversion issues; if the MySQL database has a different timezone setting than the web server. If you are going to store "integer", then I wouldn't muck that up with the conversion that MySQL does, which may not be exactly the same as the conversion functions the web server does.
If you store your date as an int timestamp, you can do this
round(sd.timestart/86400)=round(UNIX_TIMESTAMP(NOW())/86400)
This will get everything in your database that is from the same day.
For example:
SELECT id FROM uploads WHERE (approved=0 OR approved is NULL) AND round(uploads.date/86400)<=round(UNIX_TIMESTAMP(NOW())/86400) order by uploads.date DESC LIMIT 20
Will display all the uploads for today and the days before, without showing the future uploads. 86400 is the number of seconds in one day.
I have a table 't' with date(yyyy-mm-dd), hour(1-12), minute(00-59), ampm(a/p), and timezone(pst/est) fields.
How can I select the rows that are <= now()? (ie. already happened)
Thank you for your suggestions!
edit: this does it without attention to the hour/minute/ap/tz fields:
SELECT * FROM t.date WHERE date <= now()
Here's one way to do it - combine all your seconds, minutes, etc into a date and compare to NOW(), making sure you do the comparison in the same time-zone. (Untested):
SELECT *
FROM t
LEFT JOIN y ON t.constant=y.constant
WHERE CONVERT_TZ(STR_TO_DATE(CONCAT(date,' ',hour,':',minute,' 'ampm),
'%Y-%m-%d %l:%i %p' ),
timezone,"SYSTEM") < NOW();
If your hour is 01 - 12 not 1-12 then use %h instead of %l in the STR_TO_DATE.
The STR_TO_DATE tries to stick your date and time columns together and convert them into a date.
The CONVERT_TZ(...,timezone,"SYSTEM") converts this date from whatever timezone is specified in the timezone column to system time.
This is then compared to NOW(), which is always in system time.
As an aside, perhaps you should make a single column date using MySQL's date datatype, as it's a lot easier to do arithmetic on that!
For reference, here is a summary of very useful mysql date functions where you can read up on those featuring in this answer.
Good luck!
SELECT * FROM t
WHERE `date`<=DATE_SUB(curdate(), INTERVAL 1 DAY)
OR (
`date`<=DATE_ADD(curdate(), INTERVAL 1 DAY)
AND
CONVERT_TZ(CAST(CONCAT(`date`,' ',IF(`hour`=12 AND ampm='a',0,if(ampm='a',`hour`,`hour`+12)),':',`minute`,':00') AS DATETIME),'GMT',`timezone`)<=NOW()
)
Rationale for date<=DATE_[ADD|SUB](curdate(), INTERVAL 1 DAY):
The fancy conversion is quite an expensive operation, so we don't want it to run on the complete table. This is why we pre-select against an UNCHANGED date field (possibly using an index). In no timezone can an event being more than a day in current timezone's past be in the future, and in no timezone can an event more than a day in the curent timezone's future be in the past.