I have a date table, which has a column date (PK). The CREATE script is here:
CREATE TABLE date_table (
date DATE
,year INT(4)
,month INT(2)
,day INT(2)
,month_pad VARCHAR(2)
,day_pad VARCHAR(2)
,month_name VARCHAR(10)
,year_month_index INT(6)
,year_month_hypname VARCHAR(7)
,year_month_name VARCHAR(15)
,week_day_index INT(1)
,day_name VARCHAR(9)
,week INT(2)
,week_interval VARCHAR(13)
,weekend_fl INT(1)
,quarter_num INT(1)
,quarter_num_pad VARCHAR(2)
,quarter_name VARCHAR(2)
,year_quarter_index INT(6)
,year_quarter_name VARCHAR(7)
,PRIMARY KEY (date)
);
Now I would like select rows from this table with dynamic values, using such as LAST_DAY() or DATE_SUB(DATE_FORMAT(SYSDATE(),'%Y-01-01'), INTERVAL X YEAR), etc.
When one of my queries failed and didn't execute in 30 secs, I knew something was fishy, and it looks like the reason is that the index on the primary key column is not used. Here are my results (sorry for using an image instead of copying the queries, but I thought it's concise enough for this purpose, and the queries are short/simple enough):
First of all, it's strange that the BETWEEN works differently than using >= and <=. Secondly, it looks like the index is only used for constant values. If you look closely, you can see that on the right side (where >= and <= is used), it shows ~9K rows, which is half of the rows in the table (the table has about ~18k rows, dates from 2000-01-01 to `2050-12-31).
SYSDATE() returns the time at which it executes. This differs from the behavior for NOW(), which returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, NOW() returns the time at which the function or triggering statement began to execute.)
-- https://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html#function_sysdate
That is, the Optimizer does not see this as a "constant". Otherwise, the Optimizer eagerly evaluates any "constant expressions", then tries to take advantage of knowing the value.
See also the sysdate_is_now option.
Bottom line: Don't use SYSDATE() for normal datetime usage; use NOW() or CURDATE().
Looks like if I use CURRENT_DATE() (or NOW()) instead of SYSDATE(), it's working. Both of these queries:
SELECT *
FROM date_table t
WHERE 1 = 1
AND t.ddate >= LAST_DAY(CURRENT_DATE()) AND t.ddate <= LAST_DAY(CURRENT_DATE());
SELECT *
FROM date_table t
WHERE 1 = 1
AND t.ddate >= LAST_DAY(NOW()) AND t.ddate <= LAST_DAY(NOW());
Give the same result, which is this:
I will accept my answer as a solution, but I'm still looking for an explanation. I thought it might has to do something with SYSDATE() not being a DATE, but NOW() is also not a DATE...
EDIT: Forgot to add, BETWEEN is also working as I see.
Related
I have a system that checks websites for certain data at set frequencies. Each website has its own check frequency in the crawl_frequency column. This value is in days.
I have a table like this
CREATE TABLE `websites` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`domain` VARCHAR(191) NOT NULL COLLATE 'utf8mb4_unicode_ci',
`crawl_frequency` TINYINT(3) UNSIGNED NOT NULL DEFAULT '3',
`last_crawled_start` TIMESTAMP NULL DEFAULT NULL,
PRIMARY KEY (`id`)
)
I want to run queries to find new websites to check at their specified check frequency/interval. At the moment I have this query which works fine if the crawl_frequency for a website is set to one day.
SELECT domain
FROM websites
WHERE last_crawled_start <= (now() - INTERVAL 1 DAY)
LIMIT 1
Is there any way in a MySQL query I can use the value that is in the crawl_frequency column for each row in the WHERE clause.
So example I'd like to do something like:
SELECT domain
FROM websites
WHERE last_crawled_start <= (now() - INTERVAL {{INSERT VALUE OF CRAWL FREQUENCY FOR THIS PARTICULAR WEBSITE}} DAY)
LIMIT 1
You can do it like so:
SELECT domain
FROM websites
WHERE last_crawled_start <= NOW() - INTERVAL crawl_frequency DAY
LIMIT 1
Yes, really.
You can try to use DATEDIFF function, like this:
SELECT domain FROM websites
WHERE DATEDIFF(NOW(), last_crawled_start) > crawl_frequency
LIMIT 1;
Everything i read for mysql said it can't be variable, but you can use another function e.g.
SELECT * FROM websites
WHERE
(unix_timestamp() - unix_timestamp(last_crawled_start))/86400.0 > crawl_frequency
SELECT MIN(classification) AS classification
,MIN(START) AS START
,MAX(next_start) AS END
,SUM(duration) AS seconds
FROM ( SELECT *
, CASE WHEN (duration < 20*60) THEN CASE WHEN (duration = -1) THEN 'current_session' ELSE 'session' END
ELSE 'break'
END AS classification
, CASE WHEN (duration > 20*60) THEN ((#sum_grouping := #sum_grouping +2)-1)
ELSE #sum_grouping
END AS sum_grouping
FROM ( SELECT *
, CASE WHEN next_start IS NOT NULL THEN TIMESTAMPDIFF(SECOND, START, next_start) ELSE -1 END AS duration
FROM ( SELECT id, studentId, START
, (SELECT MIN(START)
FROM attempt AS sub
WHERE sub.studentId = main.studentId
AND sub.start > main.start
) AS next_start
FROM attempt AS main
WHERE main.studentId = 605
ORDER BY START
) AS t1
) AS t2
WHERE duration != 0
) AS t3
GROUP BY sum_grouping
ORDER BY START DESC, END DESC
Explanation and goal
The attempt table records a student's attempt at some activity, during a session. If two attempts are less than 20 minutes apart, we consider those to be the same session. If they are more than 20 minutes apart, we assume they took a break.
My goal with this query is to take all of the attempts and condense them down in a list of sessions and breaks, with the start time of each session, the end time (defined as the start of the subsequent session), and how long the session was. The classification is whether it is a session, a break, or the current session.
The above query does all of that, but is too slow. How can I improve the performance?
How the current query works
The innermost queries select an attempt's start time and the subsequent attempt's start time, along with the duration between those values.
Then, the #sum_grouping and sum_grouping are used to split the attempts into the sessions and breaks. #sum_grouping is only ever increased when an attempt is more than 20 minutes long (i.e. a break), and it is always increased by 2. However, sum_grouping is set to a value of one less than that for that "break". If an attempt is less than 20 minutes long, then the current #sum_grouping value is used, without modification. As a result, all breaks are distinct odd values, and all sessions (whether of 1 or more attempt) end up as distinct even numbers. This allows the GROUP BY portion to correctly separate the attempts into sessions and breaks.
Example:
Attempt type #sum_grouping sum_grouping
non-break 0 0
non-break 0 0
break 2 1
break 4 3
non-break 4 4
break 6 5
As you can see, all the breaks will be grouped by sum_grouping separately with distinct odd values and all the non-breaks will be grouped together as sessions with the even values.
The MIN(classification) simply forces "current session" to be returned when both "session" and "current session" are present within a grouped row.
OUTPUT OF SHOW CREATE TABLE attempt
CREATE TABLE attempt (
id int(11) NOT NULL AUTO_INCREMENT,
caseId int(11) NOT NULL DEFAULT '0',
eventId int(11) NOT NULL DEFAULT '0',
studentId int(11) NOT NULL DEFAULT '0',
activeUuid char(36) NOT NULL,
start timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
end timestamp NULL DEFAULT NULL,
outcome float DEFAULT NULL,
response varchar(5000) NOT NULL DEFAULT '',
PRIMARY KEY id),
KEY activeUuid activeUuid),
KEY caseId caseId,activeUuid),
KEY end end),
KEY start start),
KEY studentId studentId),
KEY attempt_idx_studentid_stat_id studentId,start,id),
KEY attempt_idx_studentid_stat studentId,start
) ENGINE=MyISAM AUTO_INCREMENT=298382 DEFAULT CHARSET=latin1
(This is not a proper Answer, but here goes anyway.)
Try not to nest 'derived' tables.
I see a lot of syntax errors.
Move from MyISAM to InnoDB.
INDEX(a, b) handles situations where you need INDEX(a), so DROP the latter.
Clearly, I am missing the forest for the trees...I am missing something obvious here!
Scenario:
I've a typical table asset_locator with multiple fields:
id, int(11) PRIMARY
logref, int(11)
unitno, int(11)
tunits, int(11)
operator, varchar(24)
lineid, varchar(24)
uniqueid, varchar(64)
timestamp, timestamp
My current challenge is to SELECT records from this table based on a date range. More specifically, a date range using the MAX(timestamp) field.
So...when selecting I need to start with the latest timestamp value and go back 3 days.
EX: I select all records WHERE the lineid = 'xyz' and going back 3 days from the latest timestamp. Below is an actual example (of the dozens) I've been trying to run.
MySQL returns a single row with all NULL values for the following:
SELECT id, logref, unitno, tunits, operator, lineid,
uniqueid, timestamp, MAX( timestamp ) AS maxdate
FROM asset_locator
WHERE 'maxdate' < DATE_ADD('maxdate',INTERVAL -3 DAY)
ORDER BY uniqueid DESC
There MUST be something obvious I am missing. If anyone has any ideas, please share.
Many thanks!
MAX() is an aggregated function, which means your SELECT will always return one row containing the maximum value. Unless you use GROUP BY, but it looks that's not what you need.
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_max
If you need all the entries between MAX(timestamp) and 3 days before, then you need to do a subselect to obtain the max date, and after that use it in the search condition. Like this:
SELECT id, logref, unitno, tunits, operator, lineid, uniqueid, timestamp
FROM asset_locator
WHERE timestamp >= DATE_ADD( (SELECT MAX(timestamp) FROM asset_locator), INTERVAL -3 DAY)
It will still run efficiently as long as you have an index defined on timestamp column.
Note: In your example
WHERE 'maxdate' < DATE_ADD('maxdate',INTERVAL -3 DAY)
Here you were are actually using the string "maxdate" because of the quotes causing the condition to return false. That's why you were seeing NULL for all fields.
Edit: Oops, forgot the "FROM asset_locator" in query. It got lost at some point when writing the answer :)
select
start_date,stop_date_original
from dates
where
start_date is not null
and stop_date_original is not null
and start_date > str_to_date('10/10/2009','%d/%m/%Y')
/*and stop_date_original < str_to_date('01/24/2013','%d/%m/%Y')*/
this query works fine but when i uncomment the last line or use it to replace the one before the result doesnt get affected or i get an empty result set.
are there issues with this approach that might cause this behaviour?
also, are the null checks intrinsically necesarry across different database systems?
Stop date has to be 24/01/2013, not 01/24/2013:
select
start_date,
stop_date_original
from
dates
where
start_date is not null
and stop_date_original is not null
and start_date > str_to_date('10/10/2009','%d/%m/%Y')
and stop_date_original < str_to_date('24/01/2013','%d/%m/%Y')
or you have to invert day and month on your function str_to_date('01/24/2013','%m/%d/%Y').
Also, if start_date is null, or if stop_date_original is null, the condition would be evaluated as null anyway so you don't need to check if they are not null, although this make things more readable.
table:
--duedate timestamp
--submissiondate timestamp
--blocksreq numeric
--file clob
--email varchar2(60)
Each entry is a file which will take blocksreq to accomplish. There are 8 blocks allotted per day (but could be modified later). before i insert into the table, i want to make sure there are enough blocks to accomplish it in the timeframe of NOW() and #duedate
I was thinking of the following, but i think i am doing it wrong:
R1 = select DAY(), #blocksperday - sum(blocksreq) as free
from table
where #duedate between NOW() and #duedate
group by DAY()
order by DAY() desc
R2 = select sum(a.free) from R1 as a;
if(R2[0] <= #blocksreq){ insert into table; }
pardon the partial pseudocode.
SQL FIDDLE: http://sqlfiddle.com/#!2/5bda5
warning: My sql fiddle has garbage code... as i dont know how to make a lot of test cases. nor set the duedate to NOW()+5 days
Something like this? (wasn't sure how partial days were handled so ignored that part)
CREATE TABLE `DatTable` (
`duedate` datetime DEFAULT NULL,
`submissiondate` datetime DEFAULT NULL,
`blocksreq` smallint(6) DEFAULT NULL
)
SET #duedate:='2012-10-15';
SET #submissiondate:=CURRENT_TIMESTAMP;
SET #blocksreq:=5;
INSERT INTO DatTable(duedate,submissiondate,blocksreq)
SELECT #duedate,#submissiondate,#blocksreq
FROM DatTable AS b
WHERE duedate > #submissiondate
HAVING COALESCE(SUM(blocksreq),0) <= DATEDIFF(#duedate,#submissiondate)*8-#blocksreq;