How to optimise my complex MySQL query? - mysql

Table
Each row represents a video that was on air at particular time on particular date. There are about 1600 videos per day.
CREATE TABLE `air_video` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`date` DATE NOT NULL,
`time` TIME NOT NULL,
`duration` TIME NOT NULL,
`asset_id` INT(10) UNSIGNED NOT NULL,
`name` VARCHAR(100) NOT NULL,
`status` VARCHAR(100) NULL DEFAULT NULL,
`updated` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE INDEX `date_2` (`date`, `time`),
INDEX `date` (`date`),
INDEX `status` (`status`),
INDEX `asset_id` (`asset_id`)
)
ENGINE=InnoDB
Task
There are two conditions.
Each video must be shown not more than 24 times per day.
Each video must be in rotation no longer than 72 hours.
In rotation means time span between the fist and the last time the video was on air.
So I need to select all videos that violate those conditions, given user-specified date range.
The result must be grouped by day and by asset_id (video id). For example:
date asset_id name dailyCount rotationSpan
2012-04-27 123 whatever_the_name 35 76
2012-04-27 134 whatever_the_name2 39 20
2012-04-28 125 whatever_the_name3 26 43
Query
By now I have written this query:
SELECT
t1.date, t1.asset_id, t1.name,
(SELECT
COUNT(t3.asset_id)
FROM air_video AS t3
WHERE t2.asset_id = t3.asset_id AND t3.date = t1.date
) AS 'dailyCount',
MIN(CONCAT(t2.date, ' ', t2.time)) AS 'firstAir',
MAX(CONCAT(t2.date, ' ', t2.time)) AS 'lastAir',
ROUND(TIMESTAMPDIFF(
MINUTE,
MIN(CONCAT(t2.date, ' ', t2.time)),
MAX(CONCAT(t2.date, ' ', t2.time))
) / 60) as 'rotationSpan'
FROM
air_video AS t1
INNER JOIN
air_video AS t2 ON
t1.asset_id = t2.asset_id
WHERE
t1.status NOT IN ('bumpers', 'clock', 'weather')
AND t1.date BETWEEN '2012-04-01' AND '2012-04-30'
GROUP BY
t1.asset_id, t1.date
HAVING
`rotationSpan` > 72
OR `dailyCount` > 24
ORDER BY
`date` ASC,
`rotationSpan` DESC,
`dailyCount` DESC
Problems
The bigger the range between user specified days - the longer it takes to complete the query (for a month range it takes about 9 sec)
The lastAir timestamp is not the latest time the video was aired on particular date but the latest time it was on air altogether.

If you need to speed up your query you need to remove the select sub query on line 3.
To still have that count you can inner join it again in the from clause with the exact parameters you used initially. This is how it should look:
SELECT
t1.date, t1.asset_id, t1.name,
COUNT(t3.asset_id) AS 'dailyCount',
MIN(CONCAT(t2.date, ' ', t2.time)) AS 'firstAir',
MAX(CONCAT(t2.date, ' ', t2.time)) AS 'lastAir',
ROUND(TIMESTAMPDIFF(
MINUTE,
MIN(CONCAT(t2.date, ' ', t2.time)),
MAX(CONCAT(t2.date, ' ', t2.time))
) / 60) as 'rotationSpan'
FROM
air_video AS t1
INNER JOIN
air_video AS t2 ON
(t1.asset_id = t2.asset_id)
INNER JOIN
air_video AS t3
ON (t2.asset_id = t3.asset_id AND t3.date = t1.date)
WHERE
t1.status NOT IN ('bumpers', 'clock', 'weather')
AND t1.date BETWEEN '2012-04-01' AND '2012-04-30'
GROUP BY
t1.asset_id, t1.date
HAVING
`rotationSpan` > 72
OR `dailyCount` > 24
ORDER BY
`date` ASC,
`rotationSpan` DESC,
`dailyCount` DESC
Since t2 is not bound by date, you are obviously looking at the whole table, instead of the date range.
Edit:
Due to a lot of date bindings the query still ran too slowly. I then took a different approach. I created 3 views (which you obviously can combine into a normal query without the views, but I like the end result query better)
--T1--
CREATE VIEW t1 AS select date,asset_id,name from air_video where (status not in ('bumpers','clock','weather')) group by asset_id,date order by date;
--T2--
CREATE VIEW t2 AS select t1.date,t1.asset_id,t1.name,min(concat(t2.date,' ',t2.time)) AS 'firstAir',max(concat(t2.date,' ',t2.time)) AS 'lastAir',round((timestampdiff(MINUTE,min(concat(t2.date,' ',t2.time)),max(concat(t2.date,' ',t2.time))) / 60),0) AS 'rotationSpan' from (t1 join air_video t2 on((t1.asset_id = t2.asset_id))) group by t1.asset_id,t1.date;
--T3--
CREATE VIEW t3 AS select t2.date,t2.asset_id,t2.name,count(t3.asset_id) AS 'dailyCount',t2.firstAir,t2.lastAir,t2.rotationSpan AS rotationSpan from (t2 join air_video t3 on(((t2.asset_id = t3.asset_id) and (t3.date = t2.date)))) group by t2.asset_id,t2.date;
From there you can then just run the following query:
SELECT
date,
asset_id,
name,
dailyCount,
firstAir,
lastAir,
rotationSpan
FROM
t3
WHERE
date BETWEEN '2012-04-01' AND '2012-04-30'
AND (
rotationSpan > 72
OR
dailyCount > 24
)
ORDER BY
date ASC,
rotationSpan DESC,
dailyCount DESC

Related

MySql is null vs is not null performance

I have a query where I am basically doing a left outer join and checking if the joined value is null
select count(T1.code)
from ( select code
from asset
where type = 'meter'
and creation_time <= '2022-04-29 00:00:00'
and (deactivation_time > '2022-04-28 00:00:00' or deactivation_time is null )
group by code
) as T1
left join ( select asset_code
from amr_midnight_data
where server_time between '2022-04-28 00:00:00' and '2022-04-29 00:00:00'
group by asset_code
) as T2 on T1.code = T2.asset_code
Where T2.asset_code is null;
This query takes 3 seconds to execute, but if I replace the is null at the end with is not null, it takes less then a second. Why is there a performance difference here and what alternatives do I have to make my original query faster?
Look at the EXPLAIN. A guess... Changing to IS NOT NULL lets the Optimizer change LEFT JOIN to JOIN, which lets it start with amr_midnight_data which might optimize better.
I think that the LEFT JOIN ( SELECT ... ) .. IS [NOT] NULL can be replaced with
WHERE [NOT] EXISTS ( SELECT 1 FROM amr_midnight_data
WHERE asset_code = T1.code
AND server_time >= '2022-04-28'
AND server_time < '2022-04-28' + INTERVAL 1 DAY )
That would like to have INDEX(asset_code, server_time)
EXISTS is faster than SELECT .. GROUP BY because it can stop as soon as one matching row is found.
asset would probably benefit from INDEX(type, creation_time) or (to make it "covering"):
INDEX(time, creation_time, deactivation_time, code)
If you wish to discuss further, please provide SHOW CREATE TABLE for both tables and EXPLAIN for each SELECT.

Select statement joining 2 tables, searching by date, and status

OK I think I have messed up somewhere but maybe someone can spot my error, because I have little clue of what I am doing.
I have 2 Tables Players and RegionPlayer (see bottom for structure)
I am trying to find when a none of the players on a region have been seen in a while. Players can be on vacation which gives then 58 days, else its only 8 days.
If none of the players on a region have been seen in that time, I want the sql search to return the regionID, as well as the most recent person on that region who was seen.
Now I think that way to do this is to get 2 results from each region, each providing me the most recent player seen who was on vacation, and who was not on vacation.
But while, I thought this would give me that, it doesn't seem to.
SELECT RegionPlayer.Regionid, Players.key, Players.Name, Players.Seen, Players.Vacation
FROM RegionPlayer
JOIN Players
ON Players.Key = RegionPlayer.Playerid
where ( RegionPlayer.Status = 1 )
GROUP BY RegionPlayer.Regionid DESC, Players.Vacation DESC
ORDER BY Players.Seen DESC
Then I am going to need to be able to tell who has not been seen in a while, this should give me that.
Now I know I can link both queries together, but I have no idea how, it has been many years since I last had to put this much effort into sql statements.
Select Players.key FROM Players
WHERE
(( Players.Vacation != 1 ) AND
( Players.Seen <= (NOW() - INTERVAL 8 DAY ) ))
OR
(( Players.Vacation != 0 ) AND
( Players.Seen <= (NOW() - INTERVAL 58 DAY ) ))
Is There a better way of doing this, I sort of remember things like views, and store procedures, and functions, would one or more of them be better?
Table Structure.
Please forgive, the names, of the tables and some of the structure, This is an example of why deciding things late at night after 1/2 a bottle of wine is a bad idea.
CREATE TABLE IF NOT EXISTS `Players` (
`key` int(11) NOT NULL,
`Name` varchar(255) NOT NULL,
`Vacation` varchar(1) NOT NULL,
`Seen` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`Modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
)
CREATE TABLE IF NOT EXISTS `RegionPlayer` (
`Key` int(11) NOT NULL,
`Playerid` int(11) NOT NULL,
`Regionid` int(11) NOT NULL,
`Type` varchar(1) NOT NULL,
`Status` int(1) NOT NULL DEFAULT '1',
`Modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`Created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00'
)
I've put up an SQLFiddle.
The query that answers your basic requirement, which seems to be: list all regions that have no active player seen in the last 8 days and no vacated player seen in the last 58 days, giving also the data of the last seen player in that region:
SELECT r.*
FROM (
SELECT rp.Regionid, p.Key, p.Name, p.Vacation, p.Seen
FROM RegionPlayer rp
JOIN Players p ON p.Key = rp.Playerid
WHERE rp.Status = 1
GROUP BY rp.Regionid
ORDER BY p.Seen DESC
) r
WHERE ((r.Vacation != 1) AND (r.Seen <= (NOW()-INTERVAL 8 DAY)))
OR ((r.Vacation != 0) AND (r.Seen <= (NOW()-INTERVAL 58 DAY)));
I desumed from your SQL that only RegionPlayer rows with a Status of 1 should be considered.
On the SQLFiddle I've create a bit of regions with different combinations, and this query does its job.
As to your first SQL statement. You say it doesn't work as expected, but to me it seems to do it... the last seen active player and last seen vacated player for each region. The sorting may not make it very readable, but it does do that.
Try this
SELECT RegionPlayer.Regionid, m.key, m.Name, m.Seen, m.Vacation
FROM RegionPlayer
JOIN (Select * as key FROM Players
WHERE
(( Players.Vacation != 1 ) AND
( Players.Seen <= (NOW() - INTERVAL 8 DAY ) ))
OR
(( Players.Vacation != 0 ) AND
( Players.Seen <= (NOW() - INTERVAL 58 DAY ) ))) m
ON m.Key = RegionPlayer.Playerid
where ( RegionPlayer.Status = 1 )
GROUP BY RegionPlayer.Regionid DESC, m.Vacation DESC
ORDER BY m.Seen DESC

MYSQL Selecting oldest date record for each unique event

I have the following two tables
CREATE TABLE IF NOT EXISTS `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
CREATE TABLE IF NOT EXISTS `events_dates` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_id` bigint(20) NOT NULL,
`date` date NOT NULL,
`start_time` time NOT NULL,
`end_time` time NOT NULL,
PRIMARY KEY (`id`),
KEY `event_id` (`event_id`),
KEY `date` (`event_id`)
) ENGINE=MyISAM;
Where the link is event_id
What I want is to retrieve all unique event records with their respective event dates ordered by the smallest date ascending within a certain period
Basically the following query does exactly what I want
SELECT Event.id, Event.title, EventDate.date, EventDate.start_time, EventDate.end_time
FROM
events AS Event
JOIN
com_events_dates AS EventDate
ON (Event.id = EventDate.event_id AND EventDate.date = (
SELECT MIN(MinEventDate.date) FROM events_dates AS MinEventDate
WHERE MinEventDate.event_id = Event.id AND MinEventDate.date >= CURDATE() # AND `MinEventDate`.`date` < '2013-02-27'
)
)
WHERE
EventDate.date >= CURDATE() # AND `EventDate`.`date` < '2013-02-27'
ORDER BY EventDate.date ASC , EventDate.start_time ASC , EventDate.end_time DESC
LIMIT 20
This query is the result of multiple attempts at further improving the slow time this initially had (1.5 seconds) when i wanted to use group by and other subqueries. Its the fastest one yet but considering that there are 1400 event records and 10000 event records in total, the query takes 400+ ms time to process, also I run a count based on this (for paging purposes) that takes a lot of time as well.
Strangely enough omitting the EventDate condition in the main where clause causes this to be even higher 1s+.
Is there anything I can do to improve this or a different approach at the table structure?
Just to clarify to anyone else... the "#" in MySQL acts as a continuation comment and is basically ignored in the query, it is not an "AND EventDate.Date < '2013-02-27'". That said, it appears you want a list of all events COMING UP that have not yet happened. I would start with a simple "prequery" that just grabs all events and the minimum date based on the event date not happening yet. Then join that result to the other tables to get the rest of the fields you want
SELECT
E.ID,
E.Title,
ED2.`date`,
ED2.Start_Time,
ED2.End_Time
FROM
( SELECT
ED.Event_ID,
MIN( ED.`date` ) as MinEventDate
from
Event_Dates ED
where
ED.`date` >= curdate()
group by
ED.Event_ID ) PreQuery
JOIN Events E
ON PreQuery.Event_ID = E.ID
JOIN Event_Dates ED2
ON PreQuery.Event_ID = ED2.Event_ID
AND PreQuery.MinEventDate = ED2.`date`
ORDER BY
ED2.`date`,
ED2.Start_Time,
ED2.End_Time DESC
LIMIT 20
Your table has redundant index on event ID, just by different names. Calling the name of an index date does not mean that's the column being indexed. The value(s) in parens ( event_id ) is what the index is built on.
So, I would change your create table to...
KEY `date` ( `event_id`, `date`, `start_time` )
Or, to manually create an index.
Create index ByEventAndDate on Event_Dates ( `event_id`, `date`, `start_time` )
If you are talking about optimization, it is helpful to include execution plans when possible.
By the way try this ones (if you are not tried it already):
SELECT
Event.id,
Event.title,
EventDate.date,
EventDate.start_time,
EventDate.end_time
FROM
(select e.id, e.title, min(date) as MinDate
from events_dates as ed
join events as e on e.id = ed.event_id
where date >= CURDATE() and date < '2013-02-27'
group by e.id, e.title) as Event
JOIN events_dates AS EventDate ON Event.id = EventDate.event_id
and Event.MinDate = EventDate.date
ORDER BY EventDate.date ASC , EventDate.start_time ASC , EventDate.end_time DESC
LIMIT 20
;
#assuming event_dates.date for greater event_dates.id always greater
SELECT
Event.id,
Event.title,
EventDate.date,
EventDate.start_time,
EventDate.end_time
FROM
(select e.id, e.title, min(ed.id) as MinID
from events_dates as ed
join events as e on e.id = ed.event_id
where date >= CURDATE() and date < '2013-02-27'
group by e.id, e.title) as Event
JOIN events_dates AS EventDate ON Event.id = EventDate.event_id
and Event.MinID = EventDate.id
ORDER BY EventDate.date ASC , EventDate.start_time ASC , EventDate.end_time DESC
LIMIT 20

Sub query or Join which is the optimal solution?

i've 2 tables, 1 user table and a table where the
outgoing emails are queued. I want to select the users
that are not online for a certain amount of time
and send them an email. I also want that, if they
already received such an email in the last 7 days
or have an scheduled email for the next 7 days, that
they are not selected.
I have 2 queries, which i think would be great if
they are working with subqueries.
As an area of which i'm not an expert in, i would
like to kindly invite you to either,
Build a subquery of the second query
Make a JOIN and exclude the second query results.
I would be far more then happy :)
Thank you for reading
SELECT
`user_id`
FROM
`user`
WHERE
DATEDIFF( CURRENT_DATE(), date_seen ) >= 7
The results of the second query should be excluded
from the query above.
SELECT
`mail_queue_id`,
`mail_id`,
`user_id`,
`status`,
`date_scheduled`,
`date_processed`
FROM
`mail_queue`
WHERE
(
DATEDIFF( CURRENT_DATE(), date_scheduled ) >= 7
OR
DATEDIFF( date_scheduled, CURRENT_DATE() ) <= 7
)
AND
(
`mail_id` = 'inactive_week'
AND
(
`status` = 'AWAITING'
OR
`status` = 'DELIVERED'
)
)
SOLUTION
SELECT
`user_id`
FROM
`user` as T1
WHERE
DATEDIFF( CURRENT_DATE(), date_seen ) >= 7
AND NOT EXISTS
(
SELECT
`user_id`
FROM
`mail_queue` as T2
WHERE
T2.`user_id` = T1.`user_id`
AND
(
DATEDIFF( CURRENT_DATE(), date_scheduled ) >= 7
OR
DATEDIFF( date_scheduled, CURRENT_DATE() ) <= 7
AND
(
`mail_id` = 'inactive_week'
AND
(
`status` = 'AWAITING'
OR
`status` = 'DELIVERED'
)
)
)
)
YOu can select the users who match the first criterion (not having logged on in the past seven days) and then "AND" that criterion to another clause using "NOT EXISTS", aliasing the same table:
select * from T where {first criterion}
and not exists
(
select * from T as T2 where T2.userid = T.userid
and ABS( DATEDIFF(datescheduled, CURRENT_DATE()) ) <=7
)
I'm not familiar with the nuances of the mysql DATEDIFF, i.e. whether it matters which date value appears in which position, but the absolute value would make it so that if the user had been sent a notice in the past 7 days or is scheduled to receive a notice in the next seven days, they would satisfy the condition, and thereby fail the NOT EXISTS condition, excluding that user from your final set.

Time interval calculation in time series using SQL

I have a MySQL table like this
CREATE TABLE IF NOT EXISTS `vals` (
`DT` datetime NOT NULL,
`value` INT(11) NOT NULL,
PRIMARY KEY (`DT`)
);
the DT is unique date with time
data sample:
INSERT INTO `vals` (`DT`,`value`) VALUES
('2011-02-05 06:05:00', 300),
('2011-02-05 11:05:00', 250),
('2011-02-05 14:35:00', 145),
('2011-02-05 16:45:00', 100),
('2011-02-05 18:50:00', 125),
('2011-02-05 19:25:00', 100),
('2011-02-05 21:10:00', 125),
('2011-02-06 00:30:00', 150);
I need to get something like this:
start|end|value
NULL,'2011-02-05 06:05:00',300
'2011-02-05 06:05:00','2011-02-05 11:05:00',250
'2011-02-05 11:05:00','2011-02-05 14:35:00',145
'2011-02-05 14:35:00','2011-02-05 16:45:00',100
'2011-02-05 16:45:00','2011-02-05 18:50:00',125
'2011-02-05 18:50:00','2011-02-05 19:25:00',100
'2011-02-05 19:25:00','2011-02-05 21:10:00',125
'2011-02-05 21:10:00','2011-02-06 00:30:00',150
'2011-02-06 00:30:00',NULL,NULL
I tried the following query:
SELECT T1.DT AS `start`,T2.DT AS `stop`, T2.value AS value FROM (
SELECT DT FROM vals
) T1
LEFT JOIN (
SELECT DT,value FROM vals
) T2
ON T2.DT > T1.DT ORDER BY T1.DT ASC
but it returns to many rows (29 instead of 9) in result and I cold not find any way to limit this using SQL. Is it Possible in MySQL?
Use a subquery
SELECT
(
select max(T1.DT)
from vals T1
where T1.DT < T2.DT
) AS `start`,
T2.DT AS `stop`,
T2.value AS value
FROM vals T2
ORDER BY T2.DT ASC
You can also use a MySQL specific solution employing variables
SELECT CAST( #dt AS DATETIME ) AS `start` , #dt := DT AS `stop` , `value`
FROM (SELECT #dt := NULL) dt, vals
ORDER BY dt ASC
But you need to do it precisely
the ORDER by must be present otherwise the variables don't roll properly
the variable needs to be NULLified within the query using a subquery to set it, otherwise if you run it twice in a row, the 2nd time it will not start with NULL
You can use a server-side variable to simulate it:
select #myvar as start, end, value, #myvar := end as next_rows_start
from vals
Variables are interpreted from left-right in sequence, so the two references to #myvar (start and next_rows_start) will output with two different values.
Just remember to reset #myvar to null before and/or after the query, otherwise the second and subsequent runs will have a wrong first row:
select #myvar := null
This would be easier if the table had a running ID column which corresponds to the times in DT (same order). If you don't want to change the table you can use a temp:
drop table if exists temp;
CREATE TABLE temp (
`id` INT(11) AUTO_INCREMENT,
`DT` datetime NOT NULL,
`value` INT(11) NOT NULL,
PRIMARY KEY (`id`)
);
insert into temp (DT,value) select * from vals order by DT asc;
select t1.DT as `start`, t2.DT as `end`, t2.value
from temp t2
left join temp t1 ON t2.id = t1.id + 1;