Select statement joining 2 tables, searching by date, and status - mysql

OK I think I have messed up somewhere but maybe someone can spot my error, because I have little clue of what I am doing.
I have 2 Tables Players and RegionPlayer (see bottom for structure)
I am trying to find when a none of the players on a region have been seen in a while. Players can be on vacation which gives then 58 days, else its only 8 days.
If none of the players on a region have been seen in that time, I want the sql search to return the regionID, as well as the most recent person on that region who was seen.
Now I think that way to do this is to get 2 results from each region, each providing me the most recent player seen who was on vacation, and who was not on vacation.
But while, I thought this would give me that, it doesn't seem to.
SELECT RegionPlayer.Regionid, Players.key, Players.Name, Players.Seen, Players.Vacation
FROM RegionPlayer
JOIN Players
ON Players.Key = RegionPlayer.Playerid
where ( RegionPlayer.Status = 1 )
GROUP BY RegionPlayer.Regionid DESC, Players.Vacation DESC
ORDER BY Players.Seen DESC
Then I am going to need to be able to tell who has not been seen in a while, this should give me that.
Now I know I can link both queries together, but I have no idea how, it has been many years since I last had to put this much effort into sql statements.
Select Players.key FROM Players
WHERE
(( Players.Vacation != 1 ) AND
( Players.Seen <= (NOW() - INTERVAL 8 DAY ) ))
OR
(( Players.Vacation != 0 ) AND
( Players.Seen <= (NOW() - INTERVAL 58 DAY ) ))
Is There a better way of doing this, I sort of remember things like views, and store procedures, and functions, would one or more of them be better?
Table Structure.
Please forgive, the names, of the tables and some of the structure, This is an example of why deciding things late at night after 1/2 a bottle of wine is a bad idea.
CREATE TABLE IF NOT EXISTS `Players` (
`key` int(11) NOT NULL,
`Name` varchar(255) NOT NULL,
`Vacation` varchar(1) NOT NULL,
`Seen` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`Modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
)
CREATE TABLE IF NOT EXISTS `RegionPlayer` (
`Key` int(11) NOT NULL,
`Playerid` int(11) NOT NULL,
`Regionid` int(11) NOT NULL,
`Type` varchar(1) NOT NULL,
`Status` int(1) NOT NULL DEFAULT '1',
`Modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`Created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00'
)

I've put up an SQLFiddle.
The query that answers your basic requirement, which seems to be: list all regions that have no active player seen in the last 8 days and no vacated player seen in the last 58 days, giving also the data of the last seen player in that region:
SELECT r.*
FROM (
SELECT rp.Regionid, p.Key, p.Name, p.Vacation, p.Seen
FROM RegionPlayer rp
JOIN Players p ON p.Key = rp.Playerid
WHERE rp.Status = 1
GROUP BY rp.Regionid
ORDER BY p.Seen DESC
) r
WHERE ((r.Vacation != 1) AND (r.Seen <= (NOW()-INTERVAL 8 DAY)))
OR ((r.Vacation != 0) AND (r.Seen <= (NOW()-INTERVAL 58 DAY)));
I desumed from your SQL that only RegionPlayer rows with a Status of 1 should be considered.
On the SQLFiddle I've create a bit of regions with different combinations, and this query does its job.
As to your first SQL statement. You say it doesn't work as expected, but to me it seems to do it... the last seen active player and last seen vacated player for each region. The sorting may not make it very readable, but it does do that.

Try this
SELECT RegionPlayer.Regionid, m.key, m.Name, m.Seen, m.Vacation
FROM RegionPlayer
JOIN (Select * as key FROM Players
WHERE
(( Players.Vacation != 1 ) AND
( Players.Seen <= (NOW() - INTERVAL 8 DAY ) ))
OR
(( Players.Vacation != 0 ) AND
( Players.Seen <= (NOW() - INTERVAL 58 DAY ) ))) m
ON m.Key = RegionPlayer.Playerid
where ( RegionPlayer.Status = 1 )
GROUP BY RegionPlayer.Regionid DESC, m.Vacation DESC
ORDER BY m.Seen DESC

Related

Get rid of the subqueries for the sake of sorting grouped data

Tables
CREATE TABLE `aircrafts_in` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`city_from` int(11) NOT NULL COMMENT 'Откуда',
`city_to` int(11) NOT NULL COMMENT 'Куда',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=91 DEFAULT CHARSET=utf8 COMMENT='Самолёты по направлениям'
CREATE TABLE `aircrafts_in_parsed_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`price` int(11) NOT NULL COMMENT 'Ценник',
`airline` varchar(255) NOT NULL COMMENT 'Авиакомпания',
`date` date NOT NULL COMMENT 'Дата вылета',
`info_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `info_id` (`info_id`),
KEY `price` (`price`),
KEY `date` (`date`)
) ENGINE=InnoDB AUTO_INCREMENT=940682 DEFAULT CHARSET=utf8
date - departure date
CREATE TABLE `aircrafts_in_parsed_info` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` enum('success','error') DEFAULT NULL,
`type` enum('roundtrip','oneway') NOT NULL,
`date` datetime NOT NULL COMMENT 'Дата парсинга',
`aircrafts_in_id` int(11) DEFAULT NULL COMMENT 'ID направления',
PRIMARY KEY (`id`),
KEY `aircrafts_in_id` (`aircrafts_in_id`)
) ENGINE=InnoDB AUTO_INCREMENT=577759 DEFAULT CHARSET=utf8
date - created date, when was parsed
Task
Get lowest price of ticket and date of departure for each month. Be aware that the minimum price is relevant, not just the minimum. If multiple dates with minimum cost, we need a first.
My solution
I think that there's something not quite right.
I don't like subqueries for grouping, how to solve this problem
select *
from (
select * from (
select airline,
price,
pdata.`date` as `date`
from aircrafts_in_parsed_data `pdata`
inner join aircrafts_in_parsed_info `pinfo`
on pdata.`info_id` = pinfo.`id`
where pinfo.`aircrafts_in_id` = {$id}
and pinfo.status = 'success'
and pinfo.`type` = 'roundtrip'
and `price` <> 0
group by pdata.`date`, year(pinfo.`date`) desc, month(pinfo.`date`) desc, day(pinfo.`date`) desc
) base
group by `date`
order by price, year(`date`) desc, month(`date`) desc, day(`date`) asc
) minpriceperdate
group by year(`date`) desc, month(`date`) desc
Takes 0.015 s without cache, table size can view in auto increment
SELECT MIN(price) AS min_price,
LEFT(date, 7) AS yyyy_mm
FROM aircrafts_in_parsed_data
GROUP BY LEFT(date, 7)
will get the lowest price for each month. But it can't say 'first'.
From my groupwise-max cheat-sheet, I derive this:
SELECT
yyyy_mm, date, price, airline -- The desired columns
FROM
( SELECT #prev := '' ) init
JOIN
( SELECT LEFT(date, 7) != #prev AS first,
#prev := LEFT(date, 7)
LEFT(date, 7) AS yyyy_mm, date, price, airline
FROM aircrafts_in_parsed_data
ORDER BY
LEFT(date, 7), -- The 'GROUP BY'
price ASC, -- ASC to do "MIN()"
date -- To get the 'first' if there are dup prices for a month
) x
WHERE first -- extract only the first of the lowest price for each month
ORDER BY yyyy_mm; -- Whatever you like
Sorry, but subqueries are necessary. (I avoided YEAR(), MONTH(), and DAY().)
You are right, your query is not correct.
Let's start with the innermost query: You group by pdata.date + pinfo.date, so you get one result row per date combination. As you don't specify which price or airline you are interested in for each date combination (such as MAX(airline) and MIN(price)), you get one airline arbitrarily chosen for a date combination and one price also arbitrarily chosen. These don't even have to belong to the same record in the table; the DBMS is free to chose one airline and one price matching the dates. Well, maybe the date combination of pdata.date and pinfo.date is already unique, but then you wouldn't have to group by at all. So however we look at this, this isn't proper.
In the next query you group by pdata.date only, thus again getting arbitrary matches for airline and price. You could have done that in the innermost query already. It makes no sense to say: "give me a randomly picked price per pdata.date and pinfo.date and from these give me a randomly picked price per pdata.date", you could just as well say it directly: "give me a randomly picked price per pdata.date". Then you order your result rows. This is completely useless, as you are using the results as a subquery (derived table) again, and such is considered an unordered set. So the ORDER BY gives the DBMS more work to do, but is in no way guaranteed to influence the main queries results.
In your main query then you group by year and month, again resulting in arbitrarily picked values.
Here is the same query a tad shorter and cleaner:
select
pdata.airline, -- some arbitrily chosen airline matching year and month
pdata.price, -- some arbitrily chosen price matching year and month
pdata.date -- some arbitrily chosen date matching year and month
from aircrafts_in_parsed_data pdata
inner join aircrafts_in_parsed_info pinfo on pdata.info_id = pinfo.id
where pinfo.aircrafts_in_id = {$id}
and pinfo.status = 'success'
and pinfo.type = 'roundtrip'
and pdata.price <> 0
group by year(pdata.date), month(pdata.date)
order by year(pdata.date) desc, month(pdata.date) desc
As to the original task (as far as I understand it): Find the records with the lowest price per month. Per month means GROUP BY month. The lowest price is MIN(price).
select
min_price_record.departure_year,
min_price_record.departure_month,
min_price_record.min_price,
full_record.departure_date,
full_record.airline
from
(
select
year(`date`) as departure_year,
month(`date`) as departure_month,
min(price) as min_price
from aircrafts_in_parsed_data
where price <> 0
and info_id in
(
select id
from aircrafts_in_parsed_info
where aircrafts_in_id = {$id}
and status = 'success'
and type = 'roundtrip'
)
group by year(`date`), month(`date`)
) min_price_record
join
(
select
`date` as departure_date,
year(`date`) as departure_year,
month(`date`) as departure_month,
price,
airline
from aircrafts_in_parsed_data
where price <> 0
and info_id in
(
select id
from aircrafts_in_parsed_info
where aircrafts_in_id = {$id}
and status = 'success'
and type = 'roundtrip'
)
) full_record on full_record.departure_year = min_price_record.departure_year
and full_record.departure_month = min_price_record.departure_month
and full_record.price = min_price_record.min_price
order by
min_price_record.departure_year desc,
min_price_record.departure_month desc;

How to get a list of available time slots from calendar database

I have a event/calendar MySQL table where each user have multiple appointments/events throughout the day. If one user can't make that appointment/event "because he/she are running behind on other appointment" I need to be able to re-assign this appointment to a different available user. So I need to display a suggestion of the top 5 users that are available for the scheduled time frame and can take this appointment, a manager will be able to re-assign this appointment to one of the suggested users.
My events table looks something like this
CREATE TABLE `calendar_events` (
`event_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`start_on` datetime NOT NULL,
`end_on` datetime NOT NULL,
`subject` varchar(255) NOT NULL,
`event_type` enum('Phone Call','Meeting','Event','Appointment','Other') CHARACTER SET latin1 COLLATE latin1_general_ci NOT NULL DEFAULT 'Phone Call',
`all_day_event` tinyint(1) DEFAULT '0' COMMENT '1 = all day event, 0 = no',
`phone_call_id` int(11) unsigned DEFAULT NULL,
`account_id` int(11) unsigned DEFAULT NULL,
`client_id` int(11) unsigned DEFAULT NULL,
`owner_id` int(11) unsigned NOT NULL,
`created_by` int(11) unsigned NOT NULL,
`created_on` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified_by` int(11) unsigned DEFAULT NULL,
`modified_on` datetime DEFAULT NULL,
`event_location` varchar(255) DEFAULT NULL,
`event_notes` varchar(10000) DEFAULT NULL,
`status` tinyint(1) NOT NULL DEFAULT '1' COMMENT '0 = purged, 1 = active, 2=pass, 3 = cancled, 5 = waiting for auditor to be enabled',
PRIMARY KEY (`event_id`),
UNIQUE KEY `phone_call_id` (`phone_call_id`,`account_id`,`client_id`),
KEY `client_id` (`client_id`),
KEY `account_id` (`account_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
so lets for event_id = 100 is assigned to user_id = 2 and scheduled to start_on = '2014-09-21 10:00:00' and end_on '2014-09-21 10:00:00'
and user_id = 5 has appointment start_on '2014-09-21 11:45:00' and end_on '2014-09-21 12:30:00'
and user_id = 2 can not make his appointment that is scheduled for '2014-09-21 10:00:00' so they system will suggest user_id = 5 as he will be for the following 105 minutes.
The the final data set will need to be
event_id org_owner suggested_owner available_for
100 2 5 105
The following query will give me a list of all available users to from the users table along with a start_on end_on value if the user have an event scheduled (one user can have multiple records.) If the start_on is null in this query that means this user does not have any event otherwise it will return the start of each event.
So if user ID appears in the query above and have a NULL value in the start_on column, this means that this user is available all day so this user should be 1 of the 5 users to recommend because it has one of the highest availability. But if a user has one/multiple rows in the data set with a non-null value in the start on then, we need to look at the start_on that is the closest to the event and then recommend the top 5 that have the greatest availability value.
SELECT user_id, start_on, end_on, subject
FROM view_users AS su
LEFT JOIN calendar_events AS c ON c.owner_id = su.user_id AND c.start_on NOT BETWEEN '2014-09-30 00:00:00' AND '2014-09-30 23:59:59' AND c.status = 1
WHERE su.is_available_today = 1
How can I extract this data set?
First proposal edited thanks to your help, just need to take care of users that don't have any events (could be achieved with a left join in 't' subquery). This could be improved a lot, but right now I'm a bit tired :)
SELECT
c.event_id, -- Event id
c.owner_id AS org_owner, -- Original owner of event
t.owner_id AS suggested_owner, -- Suggested new user
c.start_on, -- Event start
t.free_from, -- Owner free slot start
t.free_to, -- Owner free slot end
TIME_TO_SEC( TIMEDIFF( t.free_to, c.start_on ) ) /60 AS available_for -- Availibility of minutes (diff between event start and free slot end)
FROM calendar_events AS c
-- Join with free slots
LEFT JOIN (
-- Add a slot for beginning, 1999-01-01 to first event start
SELECT * FROM (
SELECT owner_id, '1900-01-01' AS free_from, MIN( start_on ) AS free_to
FROM calendar_events c3
GROUP BY owner_id
) AS deb
UNION
-- select free slots by taking the event end and the following event start
SELECT owner_id, `end_on` AS free_from, (
SELECT start_on
FROM calendar_events c2
WHERE c2.owner_id = c1.owner_id
AND c2.start_on > c1.end_on
ORDER BY c2.start_on
LIMIT 0 , 1
) AS free_to
FROM calendar_events c1
UNION
-- Add a slot for end, last event end to 2100-01-01
SELECT * FROM (
SELECT owner_id, MAX( end_on ) AS free_from, '2100-01-01' AS free_to
FROM calendar_events c3
GROUP BY owner_id
) AS end
) AS t ON t.owner_id <> c.owner_id
-- Join avoid using same user and ensure free slot matches event dates
AND t.free_from <= c.start_on AND t.free_to >= c.end_on
WHERE c.status = 1
AND c.event_id =52
GROUP BY t.owner_id -- To avoid multiple free slots by user
ORDER BY available_for DESC -- Sort to list biggest slots first
LIMIT 0, 5 -- Only five first matching users
Good luck :)
How about this:
SELECT event_id, owner_id, start_on INTO #eventid, #user, #start_on
FROM calender_events WHERE event_id = 100;
SELECT #event_id event_id,
#user org_owner,
c.owner_id suggested_owner,
TIMESTAMPDIFF(MINUTE, $start_on, COALESCE(c.min_start, DATE(#start_on) + INTERVAL 18 HOUR)) available_for
FROM
users u
LEFT JOIN
(SELECT
owner_id,
MIN(start_on)
FROM
calender_events
WHERE
(start_on BETWEEN #start_on AND DATE(#start_on) + INTERVAL 18 HOUR)
OR
(start_on BETWEEN DATE(#start_on) AND DATE(#start_on) + INTERVAL 18 HOUR AND all_day_event = 1)
GROUP BY owner_id
) c
ON u.user_id = c.owner_id
WHERE u.user_id <> #user
ORDER BY available_for DESC
LIMIT 5
Maybe you have to adjust the INTERVAL, I just made an assumption the daay ending 6 P.M.
Try this:
SELECT
co.event_id,
co.owner_id org_owner,
su.user_id suggested_owner,
ifnull(min((to_seconds(c.start_on) - to_seconds(co)) / 60), 999) available
FROM calendar_events co
CROSS JOIN view_users su
LEFT JOIN calendar_events c ON c.owner_id = su.user_id
AND c.start_on BETWEEN co.start_on AND date(adddate(co.start_on, 1))
AND c.status = 1
WHERE co.event_id = 100
AND su.is_available_today = 1
GROUP BY 1, 2, 3
ORDER BY 4 DESC
LIMIT 5
Users that have no appointments for day day after the target event get assigned the available value of "999", putting them at the top of the list.
The next event for each user is found using min() over the time gap, and all users are sorted largest time gap first, them limit gives you the top 5.

MYSQL Selecting oldest date record for each unique event

I have the following two tables
CREATE TABLE IF NOT EXISTS `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
CREATE TABLE IF NOT EXISTS `events_dates` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_id` bigint(20) NOT NULL,
`date` date NOT NULL,
`start_time` time NOT NULL,
`end_time` time NOT NULL,
PRIMARY KEY (`id`),
KEY `event_id` (`event_id`),
KEY `date` (`event_id`)
) ENGINE=MyISAM;
Where the link is event_id
What I want is to retrieve all unique event records with their respective event dates ordered by the smallest date ascending within a certain period
Basically the following query does exactly what I want
SELECT Event.id, Event.title, EventDate.date, EventDate.start_time, EventDate.end_time
FROM
events AS Event
JOIN
com_events_dates AS EventDate
ON (Event.id = EventDate.event_id AND EventDate.date = (
SELECT MIN(MinEventDate.date) FROM events_dates AS MinEventDate
WHERE MinEventDate.event_id = Event.id AND MinEventDate.date >= CURDATE() # AND `MinEventDate`.`date` < '2013-02-27'
)
)
WHERE
EventDate.date >= CURDATE() # AND `EventDate`.`date` < '2013-02-27'
ORDER BY EventDate.date ASC , EventDate.start_time ASC , EventDate.end_time DESC
LIMIT 20
This query is the result of multiple attempts at further improving the slow time this initially had (1.5 seconds) when i wanted to use group by and other subqueries. Its the fastest one yet but considering that there are 1400 event records and 10000 event records in total, the query takes 400+ ms time to process, also I run a count based on this (for paging purposes) that takes a lot of time as well.
Strangely enough omitting the EventDate condition in the main where clause causes this to be even higher 1s+.
Is there anything I can do to improve this or a different approach at the table structure?
Just to clarify to anyone else... the "#" in MySQL acts as a continuation comment and is basically ignored in the query, it is not an "AND EventDate.Date < '2013-02-27'". That said, it appears you want a list of all events COMING UP that have not yet happened. I would start with a simple "prequery" that just grabs all events and the minimum date based on the event date not happening yet. Then join that result to the other tables to get the rest of the fields you want
SELECT
E.ID,
E.Title,
ED2.`date`,
ED2.Start_Time,
ED2.End_Time
FROM
( SELECT
ED.Event_ID,
MIN( ED.`date` ) as MinEventDate
from
Event_Dates ED
where
ED.`date` >= curdate()
group by
ED.Event_ID ) PreQuery
JOIN Events E
ON PreQuery.Event_ID = E.ID
JOIN Event_Dates ED2
ON PreQuery.Event_ID = ED2.Event_ID
AND PreQuery.MinEventDate = ED2.`date`
ORDER BY
ED2.`date`,
ED2.Start_Time,
ED2.End_Time DESC
LIMIT 20
Your table has redundant index on event ID, just by different names. Calling the name of an index date does not mean that's the column being indexed. The value(s) in parens ( event_id ) is what the index is built on.
So, I would change your create table to...
KEY `date` ( `event_id`, `date`, `start_time` )
Or, to manually create an index.
Create index ByEventAndDate on Event_Dates ( `event_id`, `date`, `start_time` )
If you are talking about optimization, it is helpful to include execution plans when possible.
By the way try this ones (if you are not tried it already):
SELECT
Event.id,
Event.title,
EventDate.date,
EventDate.start_time,
EventDate.end_time
FROM
(select e.id, e.title, min(date) as MinDate
from events_dates as ed
join events as e on e.id = ed.event_id
where date >= CURDATE() and date < '2013-02-27'
group by e.id, e.title) as Event
JOIN events_dates AS EventDate ON Event.id = EventDate.event_id
and Event.MinDate = EventDate.date
ORDER BY EventDate.date ASC , EventDate.start_time ASC , EventDate.end_time DESC
LIMIT 20
;
#assuming event_dates.date for greater event_dates.id always greater
SELECT
Event.id,
Event.title,
EventDate.date,
EventDate.start_time,
EventDate.end_time
FROM
(select e.id, e.title, min(ed.id) as MinID
from events_dates as ed
join events as e on e.id = ed.event_id
where date >= CURDATE() and date < '2013-02-27'
group by e.id, e.title) as Event
JOIN events_dates AS EventDate ON Event.id = EventDate.event_id
and Event.MinID = EventDate.id
ORDER BY EventDate.date ASC , EventDate.start_time ASC , EventDate.end_time DESC
LIMIT 20

more efficient group by for query with Case

I have the following query building a recordset which is used in a pie-chart as a report.
It's not run particularly often, but when it does it takes several seconds, and I'm wondering if there's any way to make it more efficient.
SELECT
CASE
WHEN (lastStatus IS NULL) THEN 'Unused'
WHEN (attempts > 3 AND callbackAfter IS NULL) THEN 'Max Attempts Reached'
WHEN (callbackAfter IS NOT NULL AND callbackAfter > DATE_ADD(NOW(), INTERVAL 7 DAY)) THEN 'Call Back After 7 Days'
WHEN (callbackAfter IS NOT NULL AND callbackAfter <= DATE_ADD(NOW(), INTERVAL 7 DAY)) THEN 'Call Back Within 7 Days'
WHEN (archived = 0) THEN 'Call Back Within 7 Days'
ELSE 'Spoke To'
END AS statusSummary,
COUNT(leadId) AS total
FROM
CO_Lead
WHERE
groupId = 123
AND
deleted = 0
GROUP BY
statusSummary
ORDER BY
total DESC;
I have an index for (groupId, deleted), but I'm not sure it would help to add any of the other fields into the index (if it would, how do I decide which should go first? callbackAfter because it's used the most?)
The table has about 500,000 rows (but will have 10 times that a year from now.)
The only other thing I could think of was to split it out into 6 queries (with the WHEN clause moved into the WHERE), but that makes it take 3 times as long.
EDIT:
Here's the table definition
CREATE TABLE CO_Lead (
objectId int UNSIGNED NOT NULL AUTO_INCREMENT,
groupId int UNSIGNED NOT NULL,
numberToCall varchar(20) NOT NULL,
firstName varchar(100) NOT NULL,
lastName varchar(100) NOT NULL,
attempts tinyint NOT NULL default 0,
callbackAfter datetime NULL,
lastStatus varchar(30) NULL,
createdDate datetime NOT NULL,
archived bool NOT NULL default 0,
deleted bool NOT NULL default 0,
PRIMARY KEY (
objectId
)
) ENGINE = InnoDB;
ALTER TABLE CO_Lead ADD CONSTRAINT UQIX_CO_Lead UNIQUE INDEX (
objectId
);
ALTER TABLE CO_Lead ADD INDEX (
groupId,
archived,
deleted,
callbackAfter,
attempts
);
ALTER TABLE CO_Lead ADD INDEX (
groupId,
deleted,
createdDate,
lastStatus
);
ALTER TABLE CO_Lead ADD INDEX (
firstName
);
ALTER TABLE CO_Lead ADD INDEX (
lastName
);
ALTER TABLE CO_Lead ADD INDEX (
lastStatus
);
ALTER TABLE CO_Lead ADD INDEX (
createdDate
);
Notes:
If leadId cannot be NULL, then change the COUNT(leadId) to COUNT(*). They are logically equivalent but most versions of MySQL optimizer are not so clever to identify that.
Remove the two redundant callbackAfter IS NOT NULL conditions. If callbackAfter satisfies the second part, it cannot be null anyway.
You could benefit from splitting the query into 6 parts and add appropriate indexes for each one - but depending on whether the conditions at the CASE are overlapping or not, you may have wrong or correct results.
A possible rewrite (mind the different format and check if this returns the same results, it may not!)
SELECT
cnt1 AS "Unused"
, cnt2 AS "Max Attempts Reached"
, cnt3 AS "Call Back After 7 Days"
, cnt4 AS "Call Back Within 7 Days"
, cnt5 AS "Call Back Within 7 Days"
, cnt6 - (cnt1+cnt2+cnt3+cnt4+cnt5) AS "Spoke To"
FROM
( SELECT
( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND lastStatus IS NULL
) AS cnt1
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND attempts > 3 AND callbackAfter IS NULL
) AS cnt2
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND callbackAfter > DATE_ADD(NOW(), INTERVAL 7 DAY)
) AS cnt3
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND callbackAfter <= DATE_ADD(NOW(), INTERVAL 7 DAY)
) AS cnt4
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND archived = 0
) AS cnt5
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
) AS cnt6
) AS tmp ;
If it does return correct results, you could add indexes to be used for each one of the subqueries:
For subquery 1: (groupId, deleted, lastStatus)
For subquery 2, 3, 4: (groupId, deleted, callbackAfter, attempts)
For subquery 5: (groupId, deleted, archived)
Another approach would be to keep the query you have (minding only notes 1 and 2 above) and add a wide covering index:
(groupId, deleted, lastStatus, callbackAfter, attempts, archived)
Try removing the index to see if this improves the performance.
Indexes do not necessarily improve performance, in some databases. If you have an index, MySQL will always use it. In this case, that means that it will read the index, then it will have to read data from each page. The page reads are random, rather than sequential. This random reading can reduce performance, on a query that has to read all the pages anyway.

Sub query or Join which is the optimal solution?

i've 2 tables, 1 user table and a table where the
outgoing emails are queued. I want to select the users
that are not online for a certain amount of time
and send them an email. I also want that, if they
already received such an email in the last 7 days
or have an scheduled email for the next 7 days, that
they are not selected.
I have 2 queries, which i think would be great if
they are working with subqueries.
As an area of which i'm not an expert in, i would
like to kindly invite you to either,
Build a subquery of the second query
Make a JOIN and exclude the second query results.
I would be far more then happy :)
Thank you for reading
SELECT
`user_id`
FROM
`user`
WHERE
DATEDIFF( CURRENT_DATE(), date_seen ) >= 7
The results of the second query should be excluded
from the query above.
SELECT
`mail_queue_id`,
`mail_id`,
`user_id`,
`status`,
`date_scheduled`,
`date_processed`
FROM
`mail_queue`
WHERE
(
DATEDIFF( CURRENT_DATE(), date_scheduled ) >= 7
OR
DATEDIFF( date_scheduled, CURRENT_DATE() ) <= 7
)
AND
(
`mail_id` = 'inactive_week'
AND
(
`status` = 'AWAITING'
OR
`status` = 'DELIVERED'
)
)
SOLUTION
SELECT
`user_id`
FROM
`user` as T1
WHERE
DATEDIFF( CURRENT_DATE(), date_seen ) >= 7
AND NOT EXISTS
(
SELECT
`user_id`
FROM
`mail_queue` as T2
WHERE
T2.`user_id` = T1.`user_id`
AND
(
DATEDIFF( CURRENT_DATE(), date_scheduled ) >= 7
OR
DATEDIFF( date_scheduled, CURRENT_DATE() ) <= 7
AND
(
`mail_id` = 'inactive_week'
AND
(
`status` = 'AWAITING'
OR
`status` = 'DELIVERED'
)
)
)
)
YOu can select the users who match the first criterion (not having logged on in the past seven days) and then "AND" that criterion to another clause using "NOT EXISTS", aliasing the same table:
select * from T where {first criterion}
and not exists
(
select * from T as T2 where T2.userid = T.userid
and ABS( DATEDIFF(datescheduled, CURRENT_DATE()) ) <=7
)
I'm not familiar with the nuances of the mysql DATEDIFF, i.e. whether it matters which date value appears in which position, but the absolute value would make it so that if the user had been sent a notice in the past 7 days or is scheduled to receive a notice in the next seven days, they would satisfy the condition, and thereby fail the NOT EXISTS condition, excluding that user from your final set.