I have the following query building a recordset which is used in a pie-chart as a report.
It's not run particularly often, but when it does it takes several seconds, and I'm wondering if there's any way to make it more efficient.
SELECT
CASE
WHEN (lastStatus IS NULL) THEN 'Unused'
WHEN (attempts > 3 AND callbackAfter IS NULL) THEN 'Max Attempts Reached'
WHEN (callbackAfter IS NOT NULL AND callbackAfter > DATE_ADD(NOW(), INTERVAL 7 DAY)) THEN 'Call Back After 7 Days'
WHEN (callbackAfter IS NOT NULL AND callbackAfter <= DATE_ADD(NOW(), INTERVAL 7 DAY)) THEN 'Call Back Within 7 Days'
WHEN (archived = 0) THEN 'Call Back Within 7 Days'
ELSE 'Spoke To'
END AS statusSummary,
COUNT(leadId) AS total
FROM
CO_Lead
WHERE
groupId = 123
AND
deleted = 0
GROUP BY
statusSummary
ORDER BY
total DESC;
I have an index for (groupId, deleted), but I'm not sure it would help to add any of the other fields into the index (if it would, how do I decide which should go first? callbackAfter because it's used the most?)
The table has about 500,000 rows (but will have 10 times that a year from now.)
The only other thing I could think of was to split it out into 6 queries (with the WHEN clause moved into the WHERE), but that makes it take 3 times as long.
EDIT:
Here's the table definition
CREATE TABLE CO_Lead (
objectId int UNSIGNED NOT NULL AUTO_INCREMENT,
groupId int UNSIGNED NOT NULL,
numberToCall varchar(20) NOT NULL,
firstName varchar(100) NOT NULL,
lastName varchar(100) NOT NULL,
attempts tinyint NOT NULL default 0,
callbackAfter datetime NULL,
lastStatus varchar(30) NULL,
createdDate datetime NOT NULL,
archived bool NOT NULL default 0,
deleted bool NOT NULL default 0,
PRIMARY KEY (
objectId
)
) ENGINE = InnoDB;
ALTER TABLE CO_Lead ADD CONSTRAINT UQIX_CO_Lead UNIQUE INDEX (
objectId
);
ALTER TABLE CO_Lead ADD INDEX (
groupId,
archived,
deleted,
callbackAfter,
attempts
);
ALTER TABLE CO_Lead ADD INDEX (
groupId,
deleted,
createdDate,
lastStatus
);
ALTER TABLE CO_Lead ADD INDEX (
firstName
);
ALTER TABLE CO_Lead ADD INDEX (
lastName
);
ALTER TABLE CO_Lead ADD INDEX (
lastStatus
);
ALTER TABLE CO_Lead ADD INDEX (
createdDate
);
Notes:
If leadId cannot be NULL, then change the COUNT(leadId) to COUNT(*). They are logically equivalent but most versions of MySQL optimizer are not so clever to identify that.
Remove the two redundant callbackAfter IS NOT NULL conditions. If callbackAfter satisfies the second part, it cannot be null anyway.
You could benefit from splitting the query into 6 parts and add appropriate indexes for each one - but depending on whether the conditions at the CASE are overlapping or not, you may have wrong or correct results.
A possible rewrite (mind the different format and check if this returns the same results, it may not!)
SELECT
cnt1 AS "Unused"
, cnt2 AS "Max Attempts Reached"
, cnt3 AS "Call Back After 7 Days"
, cnt4 AS "Call Back Within 7 Days"
, cnt5 AS "Call Back Within 7 Days"
, cnt6 - (cnt1+cnt2+cnt3+cnt4+cnt5) AS "Spoke To"
FROM
( SELECT
( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND lastStatus IS NULL
) AS cnt1
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND attempts > 3 AND callbackAfter IS NULL
) AS cnt2
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND callbackAfter > DATE_ADD(NOW(), INTERVAL 7 DAY)
) AS cnt3
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND callbackAfter <= DATE_ADD(NOW(), INTERVAL 7 DAY)
) AS cnt4
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND archived = 0
) AS cnt5
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
) AS cnt6
) AS tmp ;
If it does return correct results, you could add indexes to be used for each one of the subqueries:
For subquery 1: (groupId, deleted, lastStatus)
For subquery 2, 3, 4: (groupId, deleted, callbackAfter, attempts)
For subquery 5: (groupId, deleted, archived)
Another approach would be to keep the query you have (minding only notes 1 and 2 above) and add a wide covering index:
(groupId, deleted, lastStatus, callbackAfter, attempts, archived)
Try removing the index to see if this improves the performance.
Indexes do not necessarily improve performance, in some databases. If you have an index, MySQL will always use it. In this case, that means that it will read the index, then it will have to read data from each page. The page reads are random, rather than sequential. This random reading can reduce performance, on a query that has to read all the pages anyway.
Related
I have a following schema:
create table myapp_task
(
title varchar(100) not null,
state varchar(11) not null,
estimate date not null,
my_id int not null auto_increment
primary key,
road_map_id int not null,
create_date date not null,
constraint myapp_task_road_map_id_5e114978_fk_myapp_roadmap_rd_id
foreign key (road_map_id) references myapp_roadmap (rd_id)
);
— auto-generated definition
create table myapp_roadmap
(
rd_id int not null auto_increment
primary key,
name varchar(50) not null
);
I want get number, begin and end of a week of create_date, number of all tasks and number of ready tasks (state = 'ready/in_progress')
Here is my query:
select DISTINCT week(create_date, 1) as week,
SUBDATE(create_date, WEEKDAY(create_date)) as beginofweek,
DATE(create_date + INTERVAL (6 - WEEKDAY(create_date)) DAY) as endofweek,
SUM(state) as number,
SUM(state = 'ready') as ready
from myapp_task
inner join myapp_roadmap
on myapp_task.road_map_id = myapp_roadmap.rd_id;
Actually, I have a problem only with count of ready tasks.
I think you are close:
select week(create_date, 1) as week,
SUBDATE(create_date, WEEKDAY(create_date)) as beginofweek,
DATE(create_date + INTERVAL (6 - WEEKDAY(create_date)) DAY) as endofweek,
count(state) as number,
SUM(CASE WHEN state = 'ready' THEN 1 ELSE 0 END) as ready,
SUM(CASE WHEN state = 'in_progress' THEN 1 ELSE 0 END) as in_progress
FROM myapp_task inner join myapp_roadmap
on myapp_task.road_map_id = myapp_roadmap.rd_id
GROUP BY week, beginofweek, endofweek
Using a CASE statement you can add up just states that are ready or in_progress separately. Furthemore, the addition of a GROUP BY insures that the count is for the week. I think MySQL would probably spit out the right result without the GROUP BY in this case, but why let it guess at what you want here. Also, if you upgrade to MySQL 5.7+ then a query like this written without a GROUP BY will error by default.
Also got rid of that DISTINCT modifier. Thanks #AaronDietz
You should look up the use of aggregate functions
COUNT is the function to return the number of rows, and to get two values for the total number of states and those which are equal to 'ready', you need to join the table twice with different join conditions.
The columns that are then not aggregated need to be included in a GROUP BY clause.
select DISTINCT week(create_date, 1) as week,
SUBDATE(create_date, WEEKDAY(create_date)) as beginofweek,
DATE(create_date + INTERVAL (6 - WEEKDAY(create_date)) DAY) as endofweek,
COUNT(r1.state) AS number,
COUNT(r2.state) AS ready
from myapp_roadmap inner join myapp_task r1
on r1.road_map_id = myapp_roadmap.rd_id
inner join myapp_task r2
on r2.road_map_id = myapp_roadmap.rd_id and r2.state = 'ready'
group by week, beginofweek, endofweek
Tables
CREATE TABLE `aircrafts_in` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`city_from` int(11) NOT NULL COMMENT 'Откуда',
`city_to` int(11) NOT NULL COMMENT 'Куда',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=91 DEFAULT CHARSET=utf8 COMMENT='Самолёты по направлениям'
CREATE TABLE `aircrafts_in_parsed_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`price` int(11) NOT NULL COMMENT 'Ценник',
`airline` varchar(255) NOT NULL COMMENT 'Авиакомпания',
`date` date NOT NULL COMMENT 'Дата вылета',
`info_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `info_id` (`info_id`),
KEY `price` (`price`),
KEY `date` (`date`)
) ENGINE=InnoDB AUTO_INCREMENT=940682 DEFAULT CHARSET=utf8
date - departure date
CREATE TABLE `aircrafts_in_parsed_info` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` enum('success','error') DEFAULT NULL,
`type` enum('roundtrip','oneway') NOT NULL,
`date` datetime NOT NULL COMMENT 'Дата парсинга',
`aircrafts_in_id` int(11) DEFAULT NULL COMMENT 'ID направления',
PRIMARY KEY (`id`),
KEY `aircrafts_in_id` (`aircrafts_in_id`)
) ENGINE=InnoDB AUTO_INCREMENT=577759 DEFAULT CHARSET=utf8
date - created date, when was parsed
Task
Get lowest price of ticket and date of departure for each month. Be aware that the minimum price is relevant, not just the minimum. If multiple dates with minimum cost, we need a first.
My solution
I think that there's something not quite right.
I don't like subqueries for grouping, how to solve this problem
select *
from (
select * from (
select airline,
price,
pdata.`date` as `date`
from aircrafts_in_parsed_data `pdata`
inner join aircrafts_in_parsed_info `pinfo`
on pdata.`info_id` = pinfo.`id`
where pinfo.`aircrafts_in_id` = {$id}
and pinfo.status = 'success'
and pinfo.`type` = 'roundtrip'
and `price` <> 0
group by pdata.`date`, year(pinfo.`date`) desc, month(pinfo.`date`) desc, day(pinfo.`date`) desc
) base
group by `date`
order by price, year(`date`) desc, month(`date`) desc, day(`date`) asc
) minpriceperdate
group by year(`date`) desc, month(`date`) desc
Takes 0.015 s without cache, table size can view in auto increment
SELECT MIN(price) AS min_price,
LEFT(date, 7) AS yyyy_mm
FROM aircrafts_in_parsed_data
GROUP BY LEFT(date, 7)
will get the lowest price for each month. But it can't say 'first'.
From my groupwise-max cheat-sheet, I derive this:
SELECT
yyyy_mm, date, price, airline -- The desired columns
FROM
( SELECT #prev := '' ) init
JOIN
( SELECT LEFT(date, 7) != #prev AS first,
#prev := LEFT(date, 7)
LEFT(date, 7) AS yyyy_mm, date, price, airline
FROM aircrafts_in_parsed_data
ORDER BY
LEFT(date, 7), -- The 'GROUP BY'
price ASC, -- ASC to do "MIN()"
date -- To get the 'first' if there are dup prices for a month
) x
WHERE first -- extract only the first of the lowest price for each month
ORDER BY yyyy_mm; -- Whatever you like
Sorry, but subqueries are necessary. (I avoided YEAR(), MONTH(), and DAY().)
You are right, your query is not correct.
Let's start with the innermost query: You group by pdata.date + pinfo.date, so you get one result row per date combination. As you don't specify which price or airline you are interested in for each date combination (such as MAX(airline) and MIN(price)), you get one airline arbitrarily chosen for a date combination and one price also arbitrarily chosen. These don't even have to belong to the same record in the table; the DBMS is free to chose one airline and one price matching the dates. Well, maybe the date combination of pdata.date and pinfo.date is already unique, but then you wouldn't have to group by at all. So however we look at this, this isn't proper.
In the next query you group by pdata.date only, thus again getting arbitrary matches for airline and price. You could have done that in the innermost query already. It makes no sense to say: "give me a randomly picked price per pdata.date and pinfo.date and from these give me a randomly picked price per pdata.date", you could just as well say it directly: "give me a randomly picked price per pdata.date". Then you order your result rows. This is completely useless, as you are using the results as a subquery (derived table) again, and such is considered an unordered set. So the ORDER BY gives the DBMS more work to do, but is in no way guaranteed to influence the main queries results.
In your main query then you group by year and month, again resulting in arbitrarily picked values.
Here is the same query a tad shorter and cleaner:
select
pdata.airline, -- some arbitrily chosen airline matching year and month
pdata.price, -- some arbitrily chosen price matching year and month
pdata.date -- some arbitrily chosen date matching year and month
from aircrafts_in_parsed_data pdata
inner join aircrafts_in_parsed_info pinfo on pdata.info_id = pinfo.id
where pinfo.aircrafts_in_id = {$id}
and pinfo.status = 'success'
and pinfo.type = 'roundtrip'
and pdata.price <> 0
group by year(pdata.date), month(pdata.date)
order by year(pdata.date) desc, month(pdata.date) desc
As to the original task (as far as I understand it): Find the records with the lowest price per month. Per month means GROUP BY month. The lowest price is MIN(price).
select
min_price_record.departure_year,
min_price_record.departure_month,
min_price_record.min_price,
full_record.departure_date,
full_record.airline
from
(
select
year(`date`) as departure_year,
month(`date`) as departure_month,
min(price) as min_price
from aircrafts_in_parsed_data
where price <> 0
and info_id in
(
select id
from aircrafts_in_parsed_info
where aircrafts_in_id = {$id}
and status = 'success'
and type = 'roundtrip'
)
group by year(`date`), month(`date`)
) min_price_record
join
(
select
`date` as departure_date,
year(`date`) as departure_year,
month(`date`) as departure_month,
price,
airline
from aircrafts_in_parsed_data
where price <> 0
and info_id in
(
select id
from aircrafts_in_parsed_info
where aircrafts_in_id = {$id}
and status = 'success'
and type = 'roundtrip'
)
) full_record on full_record.departure_year = min_price_record.departure_year
and full_record.departure_month = min_price_record.departure_month
and full_record.price = min_price_record.min_price
order by
min_price_record.departure_year desc,
min_price_record.departure_month desc;
I have a event/calendar MySQL table where each user have multiple appointments/events throughout the day. If one user can't make that appointment/event "because he/she are running behind on other appointment" I need to be able to re-assign this appointment to a different available user. So I need to display a suggestion of the top 5 users that are available for the scheduled time frame and can take this appointment, a manager will be able to re-assign this appointment to one of the suggested users.
My events table looks something like this
CREATE TABLE `calendar_events` (
`event_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`start_on` datetime NOT NULL,
`end_on` datetime NOT NULL,
`subject` varchar(255) NOT NULL,
`event_type` enum('Phone Call','Meeting','Event','Appointment','Other') CHARACTER SET latin1 COLLATE latin1_general_ci NOT NULL DEFAULT 'Phone Call',
`all_day_event` tinyint(1) DEFAULT '0' COMMENT '1 = all day event, 0 = no',
`phone_call_id` int(11) unsigned DEFAULT NULL,
`account_id` int(11) unsigned DEFAULT NULL,
`client_id` int(11) unsigned DEFAULT NULL,
`owner_id` int(11) unsigned NOT NULL,
`created_by` int(11) unsigned NOT NULL,
`created_on` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified_by` int(11) unsigned DEFAULT NULL,
`modified_on` datetime DEFAULT NULL,
`event_location` varchar(255) DEFAULT NULL,
`event_notes` varchar(10000) DEFAULT NULL,
`status` tinyint(1) NOT NULL DEFAULT '1' COMMENT '0 = purged, 1 = active, 2=pass, 3 = cancled, 5 = waiting for auditor to be enabled',
PRIMARY KEY (`event_id`),
UNIQUE KEY `phone_call_id` (`phone_call_id`,`account_id`,`client_id`),
KEY `client_id` (`client_id`),
KEY `account_id` (`account_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
so lets for event_id = 100 is assigned to user_id = 2 and scheduled to start_on = '2014-09-21 10:00:00' and end_on '2014-09-21 10:00:00'
and user_id = 5 has appointment start_on '2014-09-21 11:45:00' and end_on '2014-09-21 12:30:00'
and user_id = 2 can not make his appointment that is scheduled for '2014-09-21 10:00:00' so they system will suggest user_id = 5 as he will be for the following 105 minutes.
The the final data set will need to be
event_id org_owner suggested_owner available_for
100 2 5 105
The following query will give me a list of all available users to from the users table along with a start_on end_on value if the user have an event scheduled (one user can have multiple records.) If the start_on is null in this query that means this user does not have any event otherwise it will return the start of each event.
So if user ID appears in the query above and have a NULL value in the start_on column, this means that this user is available all day so this user should be 1 of the 5 users to recommend because it has one of the highest availability. But if a user has one/multiple rows in the data set with a non-null value in the start on then, we need to look at the start_on that is the closest to the event and then recommend the top 5 that have the greatest availability value.
SELECT user_id, start_on, end_on, subject
FROM view_users AS su
LEFT JOIN calendar_events AS c ON c.owner_id = su.user_id AND c.start_on NOT BETWEEN '2014-09-30 00:00:00' AND '2014-09-30 23:59:59' AND c.status = 1
WHERE su.is_available_today = 1
How can I extract this data set?
First proposal edited thanks to your help, just need to take care of users that don't have any events (could be achieved with a left join in 't' subquery). This could be improved a lot, but right now I'm a bit tired :)
SELECT
c.event_id, -- Event id
c.owner_id AS org_owner, -- Original owner of event
t.owner_id AS suggested_owner, -- Suggested new user
c.start_on, -- Event start
t.free_from, -- Owner free slot start
t.free_to, -- Owner free slot end
TIME_TO_SEC( TIMEDIFF( t.free_to, c.start_on ) ) /60 AS available_for -- Availibility of minutes (diff between event start and free slot end)
FROM calendar_events AS c
-- Join with free slots
LEFT JOIN (
-- Add a slot for beginning, 1999-01-01 to first event start
SELECT * FROM (
SELECT owner_id, '1900-01-01' AS free_from, MIN( start_on ) AS free_to
FROM calendar_events c3
GROUP BY owner_id
) AS deb
UNION
-- select free slots by taking the event end and the following event start
SELECT owner_id, `end_on` AS free_from, (
SELECT start_on
FROM calendar_events c2
WHERE c2.owner_id = c1.owner_id
AND c2.start_on > c1.end_on
ORDER BY c2.start_on
LIMIT 0 , 1
) AS free_to
FROM calendar_events c1
UNION
-- Add a slot for end, last event end to 2100-01-01
SELECT * FROM (
SELECT owner_id, MAX( end_on ) AS free_from, '2100-01-01' AS free_to
FROM calendar_events c3
GROUP BY owner_id
) AS end
) AS t ON t.owner_id <> c.owner_id
-- Join avoid using same user and ensure free slot matches event dates
AND t.free_from <= c.start_on AND t.free_to >= c.end_on
WHERE c.status = 1
AND c.event_id =52
GROUP BY t.owner_id -- To avoid multiple free slots by user
ORDER BY available_for DESC -- Sort to list biggest slots first
LIMIT 0, 5 -- Only five first matching users
Good luck :)
How about this:
SELECT event_id, owner_id, start_on INTO #eventid, #user, #start_on
FROM calender_events WHERE event_id = 100;
SELECT #event_id event_id,
#user org_owner,
c.owner_id suggested_owner,
TIMESTAMPDIFF(MINUTE, $start_on, COALESCE(c.min_start, DATE(#start_on) + INTERVAL 18 HOUR)) available_for
FROM
users u
LEFT JOIN
(SELECT
owner_id,
MIN(start_on)
FROM
calender_events
WHERE
(start_on BETWEEN #start_on AND DATE(#start_on) + INTERVAL 18 HOUR)
OR
(start_on BETWEEN DATE(#start_on) AND DATE(#start_on) + INTERVAL 18 HOUR AND all_day_event = 1)
GROUP BY owner_id
) c
ON u.user_id = c.owner_id
WHERE u.user_id <> #user
ORDER BY available_for DESC
LIMIT 5
Maybe you have to adjust the INTERVAL, I just made an assumption the daay ending 6 P.M.
Try this:
SELECT
co.event_id,
co.owner_id org_owner,
su.user_id suggested_owner,
ifnull(min((to_seconds(c.start_on) - to_seconds(co)) / 60), 999) available
FROM calendar_events co
CROSS JOIN view_users su
LEFT JOIN calendar_events c ON c.owner_id = su.user_id
AND c.start_on BETWEEN co.start_on AND date(adddate(co.start_on, 1))
AND c.status = 1
WHERE co.event_id = 100
AND su.is_available_today = 1
GROUP BY 1, 2, 3
ORDER BY 4 DESC
LIMIT 5
Users that have no appointments for day day after the target event get assigned the available value of "999", putting them at the top of the list.
The next event for each user is found using min() over the time gap, and all users are sorted largest time gap first, them limit gives you the top 5.
OK I think I have messed up somewhere but maybe someone can spot my error, because I have little clue of what I am doing.
I have 2 Tables Players and RegionPlayer (see bottom for structure)
I am trying to find when a none of the players on a region have been seen in a while. Players can be on vacation which gives then 58 days, else its only 8 days.
If none of the players on a region have been seen in that time, I want the sql search to return the regionID, as well as the most recent person on that region who was seen.
Now I think that way to do this is to get 2 results from each region, each providing me the most recent player seen who was on vacation, and who was not on vacation.
But while, I thought this would give me that, it doesn't seem to.
SELECT RegionPlayer.Regionid, Players.key, Players.Name, Players.Seen, Players.Vacation
FROM RegionPlayer
JOIN Players
ON Players.Key = RegionPlayer.Playerid
where ( RegionPlayer.Status = 1 )
GROUP BY RegionPlayer.Regionid DESC, Players.Vacation DESC
ORDER BY Players.Seen DESC
Then I am going to need to be able to tell who has not been seen in a while, this should give me that.
Now I know I can link both queries together, but I have no idea how, it has been many years since I last had to put this much effort into sql statements.
Select Players.key FROM Players
WHERE
(( Players.Vacation != 1 ) AND
( Players.Seen <= (NOW() - INTERVAL 8 DAY ) ))
OR
(( Players.Vacation != 0 ) AND
( Players.Seen <= (NOW() - INTERVAL 58 DAY ) ))
Is There a better way of doing this, I sort of remember things like views, and store procedures, and functions, would one or more of them be better?
Table Structure.
Please forgive, the names, of the tables and some of the structure, This is an example of why deciding things late at night after 1/2 a bottle of wine is a bad idea.
CREATE TABLE IF NOT EXISTS `Players` (
`key` int(11) NOT NULL,
`Name` varchar(255) NOT NULL,
`Vacation` varchar(1) NOT NULL,
`Seen` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`Modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
)
CREATE TABLE IF NOT EXISTS `RegionPlayer` (
`Key` int(11) NOT NULL,
`Playerid` int(11) NOT NULL,
`Regionid` int(11) NOT NULL,
`Type` varchar(1) NOT NULL,
`Status` int(1) NOT NULL DEFAULT '1',
`Modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`Created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00'
)
I've put up an SQLFiddle.
The query that answers your basic requirement, which seems to be: list all regions that have no active player seen in the last 8 days and no vacated player seen in the last 58 days, giving also the data of the last seen player in that region:
SELECT r.*
FROM (
SELECT rp.Regionid, p.Key, p.Name, p.Vacation, p.Seen
FROM RegionPlayer rp
JOIN Players p ON p.Key = rp.Playerid
WHERE rp.Status = 1
GROUP BY rp.Regionid
ORDER BY p.Seen DESC
) r
WHERE ((r.Vacation != 1) AND (r.Seen <= (NOW()-INTERVAL 8 DAY)))
OR ((r.Vacation != 0) AND (r.Seen <= (NOW()-INTERVAL 58 DAY)));
I desumed from your SQL that only RegionPlayer rows with a Status of 1 should be considered.
On the SQLFiddle I've create a bit of regions with different combinations, and this query does its job.
As to your first SQL statement. You say it doesn't work as expected, but to me it seems to do it... the last seen active player and last seen vacated player for each region. The sorting may not make it very readable, but it does do that.
Try this
SELECT RegionPlayer.Regionid, m.key, m.Name, m.Seen, m.Vacation
FROM RegionPlayer
JOIN (Select * as key FROM Players
WHERE
(( Players.Vacation != 1 ) AND
( Players.Seen <= (NOW() - INTERVAL 8 DAY ) ))
OR
(( Players.Vacation != 0 ) AND
( Players.Seen <= (NOW() - INTERVAL 58 DAY ) ))) m
ON m.Key = RegionPlayer.Playerid
where ( RegionPlayer.Status = 1 )
GROUP BY RegionPlayer.Regionid DESC, m.Vacation DESC
ORDER BY m.Seen DESC
Lets say we have a table named record with 4 fields
id (INT 11 AUTO_INC)
email (VAR 50)
timestamp (INT 11)
status (INT 1)
And the table contains following data
Now we can see that the email address test#xample.com was duplicated 4 times (the record with the lowest timestamp is the original one and all copies after that are duplicates). I can easily count the number of unique records using
SELECT COUNT(DISTINCT email) FROM record
I can also easily find out which email address was duplicated how many times using
SELECT email, count(id) FROM record GROUP BY email HAVING COUNT(id)>1
But now the business question is
How many times STATUS was 1 on all the Duplicate Records?
For example:
For test#example.com there was no duplicate record having status 1
For second#example.com there was 1 duplicate record having status 1
For third#example.com there was 1 duplicate record having status 1
For four#example.com there was no duplicate record having status 1
For five#example.com there were 2 duplicate record having status 1
So the sum of all the numbers is 0 + 1 + 1 + 0 + 2 = 4
Which means there were 4 Duplicate records which had status = 1 In table
Question
How many Duplicate records have status = 1 ?
This is a new solution that works better. It removes the first entry for each email and then counts the rest. It's not easy to read, if possible I would write this in a stored procedure but this works.
select sum(status)
from dude d1
join (select email,
min(ts) as ts
from dude
group by email) mins
using (email)
where d1.ts != mins.ts;
sqlfiddle
original answer below
Your own query to find "which email address was duplicated how many times using"
SELECT email,
count(id) as duplicates
FROM record
GROUP BY email
HAVING COUNT(id)>1
can easily be modified to answer "How many Duplicate records have status = 1"
SELECT email,
count(id) as duplicates_status_sum
FROM record
GROUP BY email
WHERE status = 1
HAVING COUNT(id)>1
Both these queries will answer including the original line so it's actually "duplicates including the original one". You can subtract 1 from the sums if the original one always have status 1.
SELECT email,
count(id) -1 as true_duplicates
FROM record
GROUP BY email
HAVING COUNT(id)>1
SELECT email,
count(id) -1 as true_duplicates_status_sum
FROM record
GROUP BY email
WHERE status = 1
HAVING COUNT(id)>1
If I am not wrong in understanding then your query should be
SELECT `email` , COUNT( `id` ) AS `tot`
FROM `record` , (
SELECT `email` AS `emt` , MIN( `timestamp` ) AS `mtm`
FROM `record`
GROUP BY `email`
) AS `temp`
WHERE `email` = `emt`
AND `timestamp` > `mtm`
AND `status` =1
GROUP BY `email`
HAVING COUNT( `id` ) >=1
First we need to get the minimum timestamp and then find duplicate records that are inserted after this timestamp and having status 1.
If you want the total sum then the query is
SELECT SUM( `tot` ) AS `duplicatesWithStatus1`
FROM (
SELECT `email` , COUNT( `id` ) AS `tot`
FROM `record` , (
SELECT `email` AS `emt` , MIN( `timestamp` ) AS `mtm`
FROM `record`
GROUP BY `email`
) AS `temp`
WHERE `email` = `emt`
AND `timestamp` > `mtm`
AND `status` =1
GROUP BY `email`
HAVING COUNT( `id` ) >=1
) AS t
Hope this is what you want
You can get the count of Duplicate records have status = 1 by
select count(*) as Duplicate_Record_Count
from (select *
from record r
where r.status=1
group by r.email,r.status
having count(r.email)>1 ) t1
The following query will return the duplicate email with status 1 count and timestamp
select r.email,count(*)-1 as Duplicate_Count,min(r.timestamp) as timestamp
from record r
where r.status=1
group by r.email
having count(r.email)>1